MQSeries.net :: View topic - code page conversion issue

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » code page conversion issue

code page conversion issue

« View previous topic :: View next topic »

Author

Message

touchofcrypticthunder

Posted: Wed Nov 18, 2009 11:49 pm Post subject: code page conversion issue

Apprentice

Joined: 08 Jul 2009
Posts: 30

I am working on WMB 6.1.0.5 on windows platform. I am facing a problem with conversion of message between diffrent code pages.

Sample flow is as below. It is a synchronous request reply between broker and webservice which is handled in single flow:

Source ----- (MQ) ----> Broker ---- (HTTP) ---> webservice (Request)
Source <----(MQ) ----- Broker <--- (HTTP) ---- webservice (Reply)

Source Application is sending xml request as 1200 code page. xml declaration of the input message (<?xml version="1.0" encoding="utf 16"?>). Webservice expects the soap message in 1208 codepage (utf8).

The input message from Source application contains special characters (ex. Scandinavian chars - SzolgÃ¡ltatÃ³). When I convert the incoming code page to 1208(utf8) soap message, webservice complains that xml structure is invalid. Then I captured the soap message in a queue to check how these characters look like. The chars become garbled as SzolgÃƒÂ¡ltatÃƒÂ³

. The conversion is done in MQInput node by enabling convert option.

If I retain the input code page 1200, then soap message constructed will be look like "Ã¿Ã¾<.s.o.a.p.:.E.n.v.e.l.o.p.e. .x.m.l.n.s.:.s.o.a.p.=.".h.t.t.p.:././.s.c.h.e.m.a.s...x.m.l.s.o.a.p...o.r.g./.s.o.a.p./.e.n.v.e.l.o.p.e./."." where extra character has been added after every character. Moreover the first 2 chars "Ã¿Ã¾" appear in the captured soap message which makes message not viewable in message browser(RFHUTIL).

Here are my questions:
1. What exactly is the difference between 1200 and 1208?
- To my understanding I think the difference lies in 1200 being double byte characters and 1208 being single byte. Is there any difference in hex values of the characters in these code pages?

2. How should I configure my flow to retain the special characters when it is being sent to webservice?

touchofcrypticthunder

Posted: Thu Nov 19, 2009 2:28 am Post subject:

Apprentice

Joined: 08 Jul 2009
Posts: 30

In continuation to the below post, I would to add one more observation about the behaviour of MRM and XML related domains.

When I dont do the conversion to 1208 i.e if I retain the input code page 1200, then the captured soap message in queue contains the extra characters Ã¿Ã¾ only in XML domains(XMLNS and XMLNSC) and these are not present when MRM domain is used.

rekarm01

Posted: Fri Nov 20, 2009 12:52 am Post subject: Re: code page conversion issue

Grand Master

Joined: 25 Jun 2008
Posts: 1415

touchofcrypticthunder wrote:

If I retain the input code page 1200, then soap message constructed will be look like "Ã¿Ã¾<.s.o.a.p.:. ...

The Ã¿Ã¾ is a byte-order mark (X'FFFE'). In this case, it indicates that the following message uses a UTF-16 little-endian encoding scheme.

touchofcrypticthunder wrote:

Source Application is sending xml request as 1200 code page.

No, it isn't. The source application is sending xml request as ccsid=1204:

ccsid=1200: UTF-16 big-endian with no byte-order mark
ccsid=1202: UTF-16 little-endian with no byte-order mark
ccsid=1204: UTF-16 with endianness determined by byte-order mark
ccsid=1208: UTF-8 (with no byte-order mark)

If the input message headers are wrong, it can be difficult to parse or convert the message properly.

touchofcrypticthunder wrote:

The chars become garbled as SzolgÃƒÂ¡ltatÃƒÂ³

. The conversion is done in MQInput node by enabling convert option.

It's also possible that the display tool is using the wrong character set to display the message, so, it can look garbled, even if it isn't.

touchofcrypticthunder

Posted: Wed Nov 25, 2009 4:52 am Post subject:

Apprentice

Joined: 08 Jul 2009
Posts: 30

Let us now consider converting input message which is in 1200 code page to 1208(utf8) code page retaining the special characters.

When I checked the 1208 character set, it seems it is one of the most widely used code page which includes wide range of characters and also include the characters which I have mentioned are getting garbled.

But as I have explained you earlier, some of the characters are not getting converted properly to 1208.

Is there a way to do this conversion to 1208 without making the characters appear bad?

fjb_saper

Posted: Wed Nov 25, 2009 3:19 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

In the broker, before the webservice request node (http) set the ccsid:

Code:

SET OutputRoot.Properties.CodedCharSetID = 1208; -- UTF-8 from memory

_________________
MQ & Broker admin

rekarm01

Posted: Mon Nov 30, 2009 11:11 am Post subject: Re: code page conversion issue

Grand Master

Joined: 25 Jun 2008
Posts: 1415

touchofcrypticthunder wrote:

Let us now consider converting input message which is in 1200 code page to 1208 (utf8) code page retaining the special characters.

The problem was that the input message was not "in 1200 code page". The sender was lying about that. Is that resolved yet?

touchofcrypticthunder wrote:

Is there a way to do this conversion to 1208 without making the characters appear bad?

First, fix the input message; then the conversion should be trivial.

Second, configure the display app to display bytes as UTF-8. If that's not possible, (such as with RFHutil), configure the display app to display both character and hex, and learn to read hex. If that's not possible, pick another display app.

touchofcrypticthunder

Posted: Wed Dec 02, 2009 12:54 am Post subject:

Apprentice

Joined: 08 Jul 2009
Posts: 30

I have captured the input message sent by source application. I can see the code page being set to 1200 and the characters are shown properly for example <Test>PÃ©nzÃ¼gyi SzolgÃ¡ltatÃ³ Zrt.</Test>.

After converting this to 1208, the characters are being changed to <Test>PÃƒÂ©nzÃƒÂ¼gyi SzolgÃƒÂ¡ltatÃƒÂ³ Zrt.</Test>

The same display application is being used to browse the input and output message.

kimbert

Posted: Wed Dec 02, 2009 5:06 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

After converting this to 1208, the characters are being changed to <Test>PÃƒÂ©nzÃƒÂ¼gyi SzolgÃƒÂ¡ltatÃƒÂ³ Zrt.</Test>

The characters are not being changed. Your viewer does not understand how to display UTF-8 characters.

touchofcrypticthunder

Posted: Wed Dec 02, 2009 9:33 pm Post subject:

Apprentice

Joined: 08 Jul 2009
Posts: 30

I am using the same display application(RFHUTIL) to view the input and output message. So why this difference in the hex values of the characters?

fjb_saper

Posted: Wed Dec 02, 2009 9:37 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

touchofcrypticthunder wrote:

I am using the same display application(RFHUTIL) to view the input and output message. So why this difference in the hex values of the characters?

Because RFHUtil does not understand UTF-8 if the platform it is running on has not been set to UTF-8.

You can always download the message to a file and open it in your browser.
The xml should display just fine if you set the browser's encoding to UTF-8

I would expect it to run on a 437 or 850 platform...

_________________
MQ & Broker admin

kimbert

Posted: Thu Dec 03, 2009 4:08 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

So why this difference in the hex values of the characters?

Did you really mean to ask that? I think ( and I hope ) you meant to ask
'So why this difference in the *displayed* values of the characters?'.

rekarm01

Posted: Sun Dec 06, 2009 2:27 am Post subject: Re: code page conversion issue

Grand Master

Joined: 25 Jun 2008
Posts: 1415

A ccsid describes a mapping between characters and bytes. The encoding application chooses a ccsid to map characters to bytes; the decoding application needs to use the same ccsid to map bytes to characters, in order to reconstruct the original string.

touchofcrypticthunder wrote:

I have captured the input message sent by source application.

How have you captured the input message?

touchofcrypticthunder wrote:

I can see the code page being set to 1200

Does that mean that MQMD.CodedCharSetId=1200, or that the source application encoded the message data that way? The correct answer is both.

touchofcrypticthunder wrote:

That can be misleading. Many display applications, for example, don't display null characters. If the source application encoded the characters using ccsid=1200 (UTF-16BE), but the display application incorrectly decodes the bytes using ccsid=819 (ISO 8859-1), the resulting string would contain erroneous null characters, but it would still appear to be correct.

touchofcrypticthunder wrote:

If the converting application encoded the characters using ccsid=1208 (UTF-8), but the display application incorrectly decodes the bytes using ccsid=819 (ISO 8859-1), then that's what the resulting string would look like. It illustrates why the encoding and decoding applications need to use the same ccsid.

touchofcrypticthunder wrote:

The same display application is being used to browse the input and output message.

... and how was the display application configured to decode the bytes in each message?

touchofcrypticthunder wrote:

So why this difference in the hex values of the characters?

Different ccsids map characters differently. The whole point of converting bytes from one ccsid to another is to change the hex values of the characters.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » code page conversion issue

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP