ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Charset recognition

Post new topic  Reply to topic
 Charset recognition « View previous topic :: View next topic » 
Author Message
catshout
PostPosted: Thu Jan 28, 2016 5:40 am    Post subject: Charset recognition Reply with quote

Acolyte

Joined: 15 May 2012
Posts: 57

Dear community,

we're receiving EDIFACT messages from external partners. The encoding definition (charset) in UNOC is sometimes misleading, so that we're required to recognize the correct encoding of the whole message within an IIB flow.

Is there any built in capability in IIB that is able to automatically determine the encoding of incoming data when none or misleading describing data are sent; and so far setting the CCSID of a message depending on this recognition? Or is there an elegant programmatic way, either in ESQL or Java?

Best regards
- Gerald
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Jan 28, 2016 5:59 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

If the sender has declared the correct CCSID when sending the message,
you should be able to find the CCSID in

InputRoot.Properties.CodedCharSetId (from memory)

Otherwise have the senders fix their code!

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Vitor
PostPosted: Thu Jan 28, 2016 6:15 am    Post subject: Re: Charset recognition Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

catshout wrote:
Is there any built in capability in IIB that is able to automatically determine the encoding of incoming data when none or misleading describing data are sent


If the sender has not supplied the CCSID (or just allowed it to default to some value) or has set it to an incorrect or misleading value there's no easy (elegant) way for IIB to determine it's been had.

If the sender has not supplied the CCSID (or just allowed it to default to some value) or has set it to an incorrect or misleading value this is known by most professionals as "doing it wrong" and the sending code needs to be corrected.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
catshout
PostPosted: Thu Jan 28, 2016 6:17 am    Post subject: Reply with quote

Acolyte

Joined: 15 May 2012
Posts: 57

That's the problem. The sender sends the plain data as it is an we pick it up. Nor CCSID neither any other information about the encoding charset.
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Jan 28, 2016 6:30 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

catshout wrote:
That's the problem.


Are you paying these people, or in some other from of monetary relationship? Is there any way they can be encouraged to get their act in gear, as this is the simplest solution to your problem.

catshout wrote:
The sender sends the plain data as it is an we pick it up.


Sends it how? MQ? Web? File? Carrier pigeon? If it's the latter, could the pigeon look up the encoding before it flies off?
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
catshout
PostPosted: Thu Jan 28, 2016 6:42 am    Post subject: Reply with quote

Acolyte

Joined: 15 May 2012
Posts: 57

Quote:
Are you paying these people, or in some other from of monetary relationship? Is there any way they can be encouraged to get their act in gear, as this is the simplest solution to your problem.


These people sending EDIFACT ORDERS to our client. He is a supplier of goods for all these partners. An EDIFACT ORDER has a portion of data that describes the encoding of the following part. And this is being set wrong in several cases. Sure, our client may ask his partners to send the correct encoding meta data, but if they don't do this .. no other way.

Quote:
Carrier pigeon


Really cool idea

The data are coming over AS2/HTTP trough a DataPower Gateway. IIB receives the data over MQ. Maybe that DataPower is able to set the encoding based on content.

I've looked around .. Some Java libs are providing charset recognition as Apache Tika. Maybe that's a way if we can't convince the sending party to set the metadata right ..
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Jan 28, 2016 6:53 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If the sending team isn't able to set the metadata right, then they are not producing correct messages.

And you should explain that you can not accept incorrect messages, as you know, they're incorrect.
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
timber
PostPosted: Fri Jan 29, 2016 1:25 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Just to back up what others have said...there is no *generally reliable* method of auto-detecting the character encoding/CCSID. There are some heuristics. Some encodings ( like UTF-8 ) are designed to be auto-detected, and the heuristics can be quite reliable. EBCDIC was not designed to be auto-detected, and any attempt to do so is likely to be fragile.

IF you understand exactly which encodings the sender might use,
AND you understand how those encodings differ
AND you think you could write an algorithm to reliably detect each one
THEN you could write yourself a CCSID-detection algorithm.

I wouldn't bother. I would go back to the sender and tell them to fix their software.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Charset recognition
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.