MQSeries.net :: View topic - ISO-8859-1 va UTF-8 encoding

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » ISO-8859-1 va UTF-8 encoding

Goto page Previous 1, 2

ISO-8859-1 va UTF-8 encoding

« View previous topic :: View next topic »

Author

Message

rekarm01

Posted: Fri Jul 31, 2009 1:25 am Post subject: Re: ISO-8859-1 va UTF-8 encoding

Grand Master

Joined: 25 Jun 2008
Posts: 1415

The MQ CodedCharSetId field serves the same purpose as the XML encoding declaration or HTML/MIME Content-Type charset parameter. It describes the character encoding, so that receiving applications can properly decode encoded character data.

er_pankajgupta84 wrote:

I conducted one small poc:

i created two input files with same contents
1. with UTF-8 as encoding
2. ISO-8859-1 as encoding.

Unless the file editor can parse XML, it is not enough to set the XML encoding attribute inside the file; the file editor must also specify which encoding to use when writing the file, (for example, during "Save As ...").

er_pankajgupta84 wrote:

My QM and Broker are running on AIX so CCSID of QM is default to 819.

The QMgr ccsid should be irrelevant, unless an application happens to also use it to provide a default value for an MQ message header ccsid. The MQ message header provides the only ccsid that's shared between sender and receiver. The basic steps for the POC should be:

File editor encodes characters using a specific character encoding.
SAP-PI reads the file to get the bytes; it also sets the message header ccsid, to match the the character encoding in step 1, and puts the message header(s) + message bytes on a queue.
The MB gets the message header(s) + message bytes off the queue, and decodes the given bytes using the given message header ccsid.

If the character encoding in step 1 doesn't match the ccsid in step 2, there is very little that the MB can do to fix it in step 3.

er_pankajgupta84 wrote:

this is the string that is causing problem: CAFÃ‰ ..
The letter " Ã‰ " has representation in 819 and 1208.

No, this is the string that is causing the problem: CAFÃƒ
The letter 'Ã‰' has a different representation for the two ccsids. If, for example, the sender encodes the string using ccsid=1208, but tells the receiver to decode the bytes using ccsid=819, then it's not the same string anymore.

Compare this:

Code:

'Ã‰' --encode(ccsid=819)---> X'c9' --decode(ccsid=819)---> 'Ã‰'

'Ã‰' --encode(ccsid=1208)--> X'c389' --decode(ccsid=1208)--> 'Ã‰'

... with this:

Code:

'Ã‰' --encode(ccsid=819)---> X'c9' --decode(ccsid=1208)--> '�'

'Ã‰' --encode(ccsid=1208)--> X'c389' --decode(ccsid=819)---> 'Ãƒ'

er_pankajgupta84

Posted: Mon Aug 03, 2009 11:04 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

Thanks for your replies...

My problem got solved. Here is the summary of my learning:

Consider the scenario where Source(SAP) is having default - UTF-8 (CCSID-1208) and broker QM(AIX) is ISO-8859-1(CCSID-819).

If a SAP sends message to broker without explicitly specifying a ccsid then QM put a label of 819 to it. So actually the Message is 1208 but it would be labelled as 819. So it will fail in broker as special characters have different representation in both encoding.

If sap explicitly say that the message is 1208 then broker QM will use 1208 encoding and process the message.

If sap explicitly say that the message is 1208 then you can also specify conversion on MQInput node to convert it in 819 broker QM will use 819 encoding and process the message.

If sap explicitly say that the message is 819 i.e. it can also convert the encoding on there side then broker QM will use 819(default) encoding and process the message.

There is another option for this conversion i.e. enabling conversion on the channel between the 2 application.

Conversion will only happen if we have representation of characters in 819 code page. Otherwise..???

I am not sure how we can handle (Euro symbol) and other Chinese character with QM ccsid set to 819. as they don't have any representation in ISO-8859-1.

smdavies99

Posted: Mon Aug 03, 2009 1:18 pm Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

This has been said before but it is worth repeating

The QM CCSID has no bearing on the CCSID of the messages originating from it UNLESS the message is sent using the QM defaults.
You can send a message using any valid CCSID if you configure the MQPUT Options correctly.
onto your other problem
If you are transmitting Chinese characters then AFAIK, CCSID 819 won't cut it for some obvious reasons.
ISO 8859-1 is Western European Latin-1. Now how do you expect that to cater for Chinese Characters?
You can find a complete map of ISO-8859-1 on Wikipedia. It also discussed the problem encountered adding support for the Euro character to it.

http://en.wikipedia.org/wiki/ISO/IEC_8859-1#ISO-8859-1
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

rekarm01

Posted: Tue Aug 04, 2009 4:04 am Post subject: Re: ISO-8859-1 va UTF-8 encoding

Grand Master

Joined: 25 Jun 2008
Posts: 1415

er_pankajgupta84 wrote:

If a SAP sends message to broker without explicitly specifying a ccsid then QM put a label of 819 to it. So actually the Message is 1208 but it would be labelled as 819. So it will fail in broker as special characters have different representation in both encoding.

Correct. The sender is responsible for setting the MQ message header ccsid to match the message encoding. If the default (QMgr ccsid) does not match the message encoding, then the sender must explicitly specify the correct ccsid.

er_pankajgupta84 wrote:

If sap explicitly say that the message is 1208 then you can also specify conversion on MQInput node to convert it in 819 broker QM will use 819 encoding and process the message.

Why convert on MQInput? Or, for that matter, why convert at all? The broker can process either ISO-8859-1 or UTF-8 XML messages as-is.

er_pankajgupta84 wrote:

There is another option for this conversion i.e. enabling conversion on the channel between the 2 application.

Don't enable channel conversion.

er_pankajgupta84 wrote:

I am not sure how we can handle (Euro symbol) and other Chinese character with QM ccsid set to 819. as they don't have any representation in ISO-8859-1.

Chinese?

Chinese CAFÃ‰? Neither the SAP nor the broker should be trying to fit Chinese characters into an ISO 8859-1 character set. That's one more reason not to convert UTF-8 to ISO-8859-1. It might be possible to represent such characters using XML numeric character references, but even if it is, it's probably not a good idea.

Stop worrying about the QMgr ccsid; worry about the message header ccsids instead. The only thing the QMgr ccsid needs to do is to support conversion of the MQMD header itself, from qmgr to qmgr. There are additional constraints regarding single-byte characters, imposed upon QMgr ccsids, that aren't relevant for encoding message data.

rekarm01

Posted: Tue Aug 04, 2009 4:11 am Post subject: Re: ISO-8859-1 va UTF-8 encoding

Grand Master

Joined: 25 Jun 2008
Posts: 1415

smdavies99 wrote:

I have a set of printouts of a large variety of character sets thich I use for this purpose. Sadly, the site I obtained them from 10+ years ago no longer exists. I originally used them for working out the mapping for Kazak characters in Russian & Greek sets.

IBM has a fairly comprehensive web site, for the character sets that its products support.

smdavies99

Posted: Tue Aug 04, 2009 7:06 am Post subject: Re: ISO-8859-1 va UTF-8 encoding

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

rekarm01 wrote:

smdavies99 wrote:

IBM has a fairly comprehensive web site, for the character sets that its products support.

Thanks for the link. It will be very useful.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

EnthusiasticSatya

Posted: Tue Dec 06, 2011 5:35 am Post subject:

Apprentice

Joined: 10 Aug 2011
Posts: 26

Hi All,

We have a similar problem where in I am receivng data from SAP and then storing it in the MQoutput. Before propagating it to the queue I am changing the charset id to 1208 (as SAP is sending the Cyrillic characters).

The Problem here is that when the data goes out to MDB ( Message driven beans ) via MQ Ouput ,the cyrillic character (Like УЛ.КУЙБЫШЕ) is inserted as "Spaces".

using the below code to set the CCSID.
SET OutputRoot.Properties.CodedCharSetId=1208;

Please suggest

kimbert

Posted: Tue Dec 06, 2011 2:03 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

The Problem here is that when the data goes out to MDB ( Message driven beans ) via MQ Ouput ,the cyrillic character (Like УЛ.КУЙБЫШЕ) is inserted as "Spaces"

Are you sure? Have you actually looked at the output queue to check the literal bytes? It sounds much more likely that the MDB is reading using some other character encoding, or maybe you are viewing the data using an editor that inserts spaces for unrecognised characters.

EnthusiasticSatya

Posted: Wed Jan 11, 2012 5:06 am Post subject:

Apprentice

Joined: 10 Aug 2011
Posts: 26

I am mentioning 3 scenaiors below

1) When we manually push the data via RFHUtil by setting the codepage to 1208 , then I could see the data inserted correctly in the DB.

I could see the special character in putty ( the same characters which were present in the input file i.e. ÃƒÂ».).

2) when I tested by pushing the input file (without manually setting code page in RFHUtil as 1208 )with the cyrillic character (ÃƒÂ».) , the data was inserted in the DB as Ã¢.

3) When the data is pushed directly from SAP , the field containing the special character is inserted as a dot (.)

smdavies99

Posted: Wed Jan 11, 2012 6:01 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

EnthusiasticSatya wrote:

3) When the data is pushed directly from SAP , the field containing the special character is inserted as a dot (.)

How do you know it is a '.'?
The ONLY real way to judge is to look at the actual HEX value of the data.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

Display posts from previous:

Goto page Previous 1, 2

Page 2 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » ISO-8859-1 va UTF-8 encoding

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP