Author |
Message
|
touchofcrypticthunder |
Posted: Wed Nov 18, 2009 11:49 pm Post subject: code page conversion issue |
|
|
Apprentice
Joined: 08 Jul 2009 Posts: 30
|
I am working on WMB 6.1.0.5 on windows platform. I am facing a problem with conversion of message between diffrent code pages.
Sample flow is as below. It is a synchronous request reply between broker and webservice which is handled in single flow:
Source ----- (MQ) ----> Broker ---- (HTTP) ---> webservice (Request)
Source <----(MQ) ----- Broker <--- (HTTP) ---- webservice (Reply)
Source Application is sending xml request as 1200 code page. xml declaration of the input message (<?xml version="1.0" encoding="utf 16"?>). Webservice expects the soap message in 1208 codepage (utf8).
The input message from Source application contains special characters (ex. Scandinavian chars - Szolgáltató). When I convert the incoming code page to 1208(utf8) soap message, webservice complains that xml structure is invalid. Then I captured the soap message in a queue to check how these characters look like. The chars become garbled as Szolgáltató . The conversion is done in MQInput node by enabling convert option.
If I retain the input code page 1200, then soap message constructed will be look like "ÿþ<.s.o.a.p.:.E.n.v.e.l.o.p.e. .x.m.l.n.s.:.s.o.a.p.=.".h.t.t.p.:././.s.c.h.e.m.a.s...x.m.l.s.o.a.p...o.r.g./.s.o.a.p./.e.n.v.e.l.o.p.e./."." where extra character has been added after every character. Moreover the first 2 chars "ÿþ" appear in the captured soap message which makes message not viewable in message browser(RFHUTIL).
Here are my questions:
1. What exactly is the difference between 1200 and 1208?
- To my understanding I think the difference lies in 1200 being double byte characters and 1208 being single byte. Is there any difference in hex values of the characters in these code pages?
2. How should I configure my flow to retain the special characters when it is being sent to webservice? |
|
Back to top |
|
 |
touchofcrypticthunder |
Posted: Thu Nov 19, 2009 2:28 am Post subject: |
|
|
Apprentice
Joined: 08 Jul 2009 Posts: 30
|
In continuation to the below post, I would to add one more observation about the behaviour of MRM and XML related domains.
When I dont do the conversion to 1208 i.e if I retain the input code page 1200, then the captured soap message in queue contains the extra characters ÿþ only in XML domains(XMLNS and XMLNSC) and these are not present when MRM domain is used. |
|
Back to top |
|
 |
rekarm01 |
Posted: Fri Nov 20, 2009 12:52 am Post subject: Re: code page conversion issue |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
touchofcrypticthunder wrote: |
If I retain the input code page 1200, then soap message constructed will be look like "ÿþ<.s.o.a.p.:. ... |
The ÿþ is a byte-order mark (X'FFFE'). In this case, it indicates that the following message uses a UTF-16 little-endian encoding scheme.
touchofcrypticthunder wrote: |
Source Application is sending xml request as 1200 code page. |
No, it isn't. The source application is sending xml request as ccsid=1204:- ccsid=1200: UTF-16 big-endian with no byte-order mark
- ccsid=1202: UTF-16 little-endian with no byte-order mark
- ccsid=1204: UTF-16 with endianness determined by byte-order mark
- ccsid=1208: UTF-8 (with no byte-order mark)
If the input message headers are wrong, it can be difficult to parse or convert the message properly.
touchofcrypticthunder wrote: |
The chars become garbled as Szolgáltató . The conversion is done in MQInput node by enabling convert option. |
It's also possible that the display tool is using the wrong character set to display the message, so, it can look garbled, even if it isn't. |
|
Back to top |
|
 |
touchofcrypticthunder |
Posted: Wed Nov 25, 2009 4:52 am Post subject: |
|
|
Apprentice
Joined: 08 Jul 2009 Posts: 30
|
Let us now consider converting input message which is in 1200 code page to 1208(utf8) code page retaining the special characters.
When I checked the 1208 character set, it seems it is one of the most widely used code page which includes wide range of characters and also include the characters which I have mentioned are getting garbled.
But as I have explained you earlier, some of the characters are not getting converted properly to 1208.
Is there a way to do this conversion to 1208 without making the characters appear bad? |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Nov 25, 2009 3:19 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
In the broker, before the webservice request node (http) set the ccsid:
Code: |
SET OutputRoot.Properties.CodedCharSetID = 1208; -- UTF-8 from memory |
 _________________ MQ & Broker admin |
|
Back to top |
|
 |
rekarm01 |
Posted: Mon Nov 30, 2009 11:11 am Post subject: Re: code page conversion issue |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
touchofcrypticthunder wrote: |
Let us now consider converting input message which is in 1200 code page to 1208 (utf8) code page retaining the special characters. |
The problem was that the input message was not "in 1200 code page". The sender was lying about that. Is that resolved yet?
touchofcrypticthunder wrote: |
Is there a way to do this conversion to 1208 without making the characters appear bad? |
First, fix the input message; then the conversion should be trivial.
Second, configure the display app to display bytes as UTF-8. If that's not possible, (such as with RFHutil), configure the display app to display both character and hex, and learn to read hex. If that's not possible, pick another display app. |
|
Back to top |
|
 |
touchofcrypticthunder |
Posted: Wed Dec 02, 2009 12:54 am Post subject: |
|
|
Apprentice
Joined: 08 Jul 2009 Posts: 30
|
I have captured the input message sent by source application. I can see the code page being set to 1200 and the characters are shown properly for example <Test>Pénzügyi Szolgáltató Zrt.</Test>.
After converting this to 1208, the characters are being changed to <Test>Pénzügyi Szolgáltató Zrt.</Test>
The same display application is being used to browse the input and output message. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Dec 02, 2009 5:06 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
After converting this to 1208, the characters are being changed to <Test>Pénzügyi Szolgáltató Zrt.</Test> |
The characters are not being changed. Your viewer does not understand how to display UTF-8 characters. |
|
Back to top |
|
 |
touchofcrypticthunder |
Posted: Wed Dec 02, 2009 9:33 pm Post subject: |
|
|
Apprentice
Joined: 08 Jul 2009 Posts: 30
|
I am using the same display application(RFHUTIL) to view the input and output message. So why this difference in the hex values of the characters? |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Dec 02, 2009 9:37 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
touchofcrypticthunder wrote: |
I am using the same display application(RFHUTIL) to view the input and output message. So why this difference in the hex values of the characters? |
Because RFHUtil does not understand UTF-8 if the platform it is running on has not been set to UTF-8.
You can always download the message to a file and open it in your browser.
The xml should display just fine if you set the browser's encoding to UTF-8
I would expect it to run on a 437 or 850 platform...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
kimbert |
Posted: Thu Dec 03, 2009 4:08 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
So why this difference in the hex values of the characters? |
Did you really mean to ask that? I think ( and I hope ) you meant to ask
'So why this difference in the *displayed* values of the characters?'. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Dec 06, 2009 2:27 am Post subject: Re: code page conversion issue |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
A ccsid describes a mapping between characters and bytes. The encoding application chooses a ccsid to map characters to bytes; the decoding application needs to use the same ccsid to map bytes to characters, in order to reconstruct the original string.
touchofcrypticthunder wrote: |
I have captured the input message sent by source application. |
How have you captured the input message?
touchofcrypticthunder wrote: |
I can see the code page being set to 1200 |
Does that mean that MQMD.CodedCharSetId=1200, or that the source application encoded the message data that way? The correct answer is both.
touchofcrypticthunder wrote: |
and the characters are shown properly for example <Test>Pénzügyi Szolgáltató Zrt.</Test>. |
That can be misleading. Many display applications, for example, don't display null characters. If the source application encoded the characters using ccsid=1200 (UTF-16BE), but the display application incorrectly decodes the bytes using ccsid=819 (ISO 8859-1), the resulting string would contain erroneous null characters, but it would still appear to be correct.
touchofcrypticthunder wrote: |
After converting this to 1208, the characters are being changed to <Test>Pénzügyi Szolgáltató Zrt.</Test> |
If the converting application encoded the characters using ccsid=1208 (UTF-8), but the display application incorrectly decodes the bytes using ccsid=819 (ISO 8859-1), then that's what the resulting string would look like. It illustrates why the encoding and decoding applications need to use the same ccsid.
touchofcrypticthunder wrote: |
The same display application is being used to browse the input and output message. |
... and how was the display application configured to decode the bytes in each message?
touchofcrypticthunder wrote: |
So why this difference in the hex values of the characters? |
Different ccsids map characters differently. The whole point of converting bytes from one ccsid to another is to change the hex values of the characters. |
|
Back to top |
|
 |
|