ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » code page conversion issue

Post new topic  Reply to topic
 code page conversion issue « View previous topic :: View next topic » 
Author Message
touchofcrypticthunder
PostPosted: Wed Nov 18, 2009 11:49 pm    Post subject: code page conversion issue Reply with quote

Apprentice

Joined: 08 Jul 2009
Posts: 30

I am working on WMB 6.1.0.5 on windows platform. I am facing a problem with conversion of message between diffrent code pages.

Sample flow is as below. It is a synchronous request reply between broker and webservice which is handled in single flow:

Source ----- (MQ) ----> Broker ---- (HTTP) ---> webservice (Request)
Source <----(MQ) ----- Broker <--- (HTTP) ---- webservice (Reply)

Source Application is sending xml request as 1200 code page. xml declaration of the input message (<?xml version="1.0" encoding="utf 16"?>). Webservice expects the soap message in 1208 codepage (utf8).

The input message from Source application contains special characters (ex. Scandinavian chars - Szolgáltató). When I convert the incoming code page to 1208(utf8) soap message, webservice complains that xml structure is invalid. Then I captured the soap message in a queue to check how these characters look like. The chars become garbled as Szolgáltató . The conversion is done in MQInput node by enabling convert option.

If I retain the input code page 1200, then soap message constructed will be look like "ÿþ<.s.o.a.p.:.E.n.v.e.l.o.p.e. .x.m.l.n.s.:.s.o.a.p.=.".h.t.t.p.:././.s.c.h.e.m.a.s...x.m.l.s.o.a.p...o.r.g./.s.o.a.p./.e.n.v.e.l.o.p.e./."." where extra character has been added after every character. Moreover the first 2 chars "ÿþ" appear in the captured soap message which makes message not viewable in message browser(RFHUTIL).

Here are my questions:
1. What exactly is the difference between 1200 and 1208?
- To my understanding I think the difference lies in 1200 being double byte characters and 1208 being single byte. Is there any difference in hex values of the characters in these code pages?

2. How should I configure my flow to retain the special characters when it is being sent to webservice?
Back to top
View user's profile Send private message
touchofcrypticthunder
PostPosted: Thu Nov 19, 2009 2:28 am    Post subject: Reply with quote

Apprentice

Joined: 08 Jul 2009
Posts: 30

In continuation to the below post, I would to add one more observation about the behaviour of MRM and XML related domains.

When I dont do the conversion to 1208 i.e if I retain the input code page 1200, then the captured soap message in queue contains the extra characters ÿþ only in XML domains(XMLNS and XMLNSC) and these are not present when MRM domain is used.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Fri Nov 20, 2009 12:52 am    Post subject: Re: code page conversion issue Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

touchofcrypticthunder wrote:
If I retain the input code page 1200, then soap message constructed will be look like "ÿþ<.s.o.a.p.:. ...

The ÿþ is a byte-order mark (X'FFFE'). In this case, it indicates that the following message uses a UTF-16 little-endian encoding scheme.

touchofcrypticthunder wrote:
Source Application is sending xml request as 1200 code page.

No, it isn't. The source application is sending xml request as ccsid=1204:
  • ccsid=1200: UTF-16 big-endian with no byte-order mark
  • ccsid=1202: UTF-16 little-endian with no byte-order mark
  • ccsid=1204: UTF-16 with endianness determined by byte-order mark
  • ccsid=1208: UTF-8 (with no byte-order mark)
If the input message headers are wrong, it can be difficult to parse or convert the message properly.

touchofcrypticthunder wrote:
The chars become garbled as Szolgáltató . The conversion is done in MQInput node by enabling convert option.

It's also possible that the display tool is using the wrong character set to display the message, so, it can look garbled, even if it isn't.
Back to top
View user's profile Send private message
touchofcrypticthunder
PostPosted: Wed Nov 25, 2009 4:52 am    Post subject: Reply with quote

Apprentice

Joined: 08 Jul 2009
Posts: 30

Let us now consider converting input message which is in 1200 code page to 1208(utf8) code page retaining the special characters.

When I checked the 1208 character set, it seems it is one of the most widely used code page which includes wide range of characters and also include the characters which I have mentioned are getting garbled.

But as I have explained you earlier, some of the characters are not getting converted properly to 1208.

Is there a way to do this conversion to 1208 without making the characters appear bad?
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Nov 25, 2009 3:19 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

In the broker, before the webservice request node (http) set the ccsid:
Code:
SET OutputRoot.Properties.CodedCharSetID = 1208; -- UTF-8  from memory

_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Mon Nov 30, 2009 11:11 am    Post subject: Re: code page conversion issue Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

touchofcrypticthunder wrote:
Let us now consider converting input message which is in 1200 code page to 1208 (utf8) code page retaining the special characters.

The problem was that the input message was not "in 1200 code page". The sender was lying about that. Is that resolved yet?

touchofcrypticthunder wrote:
Is there a way to do this conversion to 1208 without making the characters appear bad?

First, fix the input message; then the conversion should be trivial.

Second, configure the display app to display bytes as UTF-8. If that's not possible, (such as with RFHutil), configure the display app to display both character and hex, and learn to read hex. If that's not possible, pick another display app.
Back to top
View user's profile Send private message
touchofcrypticthunder
PostPosted: Wed Dec 02, 2009 12:54 am    Post subject: Reply with quote

Apprentice

Joined: 08 Jul 2009
Posts: 30

I have captured the input message sent by source application. I can see the code page being set to 1200 and the characters are shown properly for example <Test>Pénzügyi Szolgáltató Zrt.</Test>.

After converting this to 1208, the characters are being changed to <Test>Pénzügyi Szolgáltató Zrt.</Test>

The same display application is being used to browse the input and output message.
Back to top
View user's profile Send private message
kimbert
PostPosted: Wed Dec 02, 2009 5:06 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
After converting this to 1208, the characters are being changed to <Test>Pénzügyi Szolgáltató Zrt.</Test>
The characters are not being changed. Your viewer does not understand how to display UTF-8 characters.
Back to top
View user's profile Send private message
touchofcrypticthunder
PostPosted: Wed Dec 02, 2009 9:33 pm    Post subject: Reply with quote

Apprentice

Joined: 08 Jul 2009
Posts: 30

I am using the same display application(RFHUTIL) to view the input and output message. So why this difference in the hex values of the characters?
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Dec 02, 2009 9:37 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

touchofcrypticthunder wrote:
I am using the same display application(RFHUTIL) to view the input and output message. So why this difference in the hex values of the characters?

Because RFHUtil does not understand UTF-8 if the platform it is running on has not been set to UTF-8.

You can always download the message to a file and open it in your browser.
The xml should display just fine if you set the browser's encoding to UTF-8

I would expect it to run on a 437 or 850 platform...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
kimbert
PostPosted: Thu Dec 03, 2009 4:08 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
So why this difference in the hex values of the characters?
Did you really mean to ask that? I think ( and I hope ) you meant to ask
'So why this difference in the *displayed* values of the characters?'.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Sun Dec 06, 2009 2:27 am    Post subject: Re: code page conversion issue Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

A ccsid describes a mapping between characters and bytes. The encoding application chooses a ccsid to map characters to bytes; the decoding application needs to use the same ccsid to map bytes to characters, in order to reconstruct the original string.

touchofcrypticthunder wrote:
I have captured the input message sent by source application.

How have you captured the input message?

touchofcrypticthunder wrote:
I can see the code page being set to 1200

Does that mean that MQMD.CodedCharSetId=1200, or that the source application encoded the message data that way? The correct answer is both.

touchofcrypticthunder wrote:
and the characters are shown properly for example <Test>Pénzügyi Szolgáltató Zrt.</Test>.

That can be misleading. Many display applications, for example, don't display null characters. If the source application encoded the characters using ccsid=1200 (UTF-16BE), but the display application incorrectly decodes the bytes using ccsid=819 (ISO 8859-1), the resulting string would contain erroneous null characters, but it would still appear to be correct.

touchofcrypticthunder wrote:
After converting this to 1208, the characters are being changed to <Test>Pénzügyi Szolgáltató Zrt.</Test>

If the converting application encoded the characters using ccsid=1208 (UTF-8), but the display application incorrectly decodes the bytes using ccsid=819 (ISO 8859-1), then that's what the resulting string would look like. It illustrates why the encoding and decoding applications need to use the same ccsid.

touchofcrypticthunder wrote:
The same display application is being used to browse the input and output message.

... and how was the display application configured to decode the bytes in each message?

touchofcrypticthunder wrote:
So why this difference in the hex values of the characters?

Different ccsids map characters differently. The whole point of converting bytes from one ccsid to another is to change the hex values of the characters.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » code page conversion issue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.