MQSeries.net :: View topic - Parsing XML message Having Chinese Charecters in WMB

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing XML message Having Chinese Charecters in WMB

Parsing XML message Having Chinese Charecters in WMB

« View previous topic :: View next topic »

Author

Message

sumanth84

Posted: Tue Jun 08, 2010 6:53 pm Post subject: Parsing XML message Having Chinese Charecters in WMB

Guest

Hi,

I am trying to parse a XML message with Chinese Characters. With respect to guidelines from Client we have to use a framework for building all WMB related services.

Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser. I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

The problem is when I parse the blob there is some other charecters before the first byte that is "<". (for Ex: fffe). Hence the parsing is failing.

WMB 6.1.0.4

Can anyone please provide some inputs?

Thanks

rekarm01

Posted: Tue Jun 08, 2010 7:55 pm Post subject: Re: Parsing XML message Having Chinese Characters in WMB

Grand Master

Joined: 25 Jun 2008
Posts: 1415

sumanth84 wrote:

Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser.

What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?

sumanth84 wrote:

I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

sumanth84 wrote:

The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe).

X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.

If that doesn't work, then provide more details.

fjb_saper

Posted: Tue Jun 08, 2010 8:20 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20772
Location: LI,NY

Also you did not specify the value of InputRoot.Properties.CodedCharSet.

This is really the value you should use for parsing. You might have to verify with header chaining and use the value on the last header...

Now if the content does not match the description (like ccsid = 1208 but the content is in 1204 due to order byte) then you need to push back to the message producer and force him to describe his message correctly.

_________________
MQ & Broker admin

kimbert

Posted: Tue Jun 08, 2010 11:56 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

You are doing it wrong. The sender should supply the correct CCSID for the message in the MQMD header ( or whatever the final header is ). If the sender is not setting the CCSID in a header, then they *must* tell you which CCSID they have used, so that your message flow can parse the data correctly.
In general, it is not safe to guess the CCSID - you will usually end up choosing one which works for your test data, but fails with real production data.

sumanth84

Posted: Wed Jun 09, 2010 8:13 am Post subject: Re: Parsing XML message Having Chinese Characters in WMB

Guest

[quote="rekarm01"]

sumanth84 wrote:

Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser.

[quote="rekarm01"]
What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?

Due to framework constraints all messages are taken as BLOB from the MQInput Node. Later the message is parsed against the parser depending on the requirement.

sumanth84 wrote:

I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

rekarm01 wrote:

Tried where? In an MQ header, before the MQInput node? In the MQInput node "Convert ccsid" property? In a compute node after the MQInput node?

If the input ccsid is wrong before the MQInput node, it's better to have the sender fix it, rather than try to fix it in the message flow.

I tried to parse the message in Compute node that is after MQInput Node.
I will ask for the provider to share the CCSID and try.

sumanth84 wrote:

The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe).

X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.

If that doesn't work, then provide more details.

Quote:

sumanth84

Posted: Thu Jun 10, 2010 6:45 pm Post subject: Re: Parsing XML message Having Chinese Characters in WMB

Guest

[quote="sumanth84"][quote="rekarm01"]

sumanth84 wrote:

Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser.

rekarm01 wrote:

What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?

Due to framework constraints all messages are taken as BLOB from the MQInput Node. Later the message is parsed against the parser depending on the requirement.

sumanth84 wrote:

I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

rekarm01 wrote:

sumanth84 wrote:

The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe).

X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.

If that doesn't work, then provide more details.

Quote:

Hi,

I have tried with CCSID 1204.The Input message is not able to parse in XMLNSC domain.

The Flow is:

MQInputNode->Compute Node ->MQOutputNode

The Input message will be taken as BLOB and parsed in Compute node using PARSE statement.

If i read the input message using RFHUtil , CCSID is set as 437.

Please give your comments with regard to CCSID to be used in this regard.

fjb_saper

Posted: Thu Jun 10, 2010 7:17 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20772
Location: LI,NY

I doubt very much that CCSID 437 will support your Chinese characters.
You need to really force the sending application to specify the CCSID of the data the correct way. Then you must make sure that there is no translation happening on the channels.

Have fun

_________________
MQ & Broker admin

SANTYP

Posted: Thu Jun 10, 2010 10:54 pm Post subject:

Centurion

Joined: 27 Mar 2007
Posts: 142

Typically when they are not specified the CCSID which they are sending ..
I think u can try with CCSID as '0' and ENCODING as '0'..
which will parse the message.

use the below statement to parse the message.

CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') PARSE(InputRoot.BLOB.BLOB ENCODING 0 CCSID 0);

kimbert

Posted: Fri Jun 11, 2010 1:18 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

I think u can try with CCSID as '0' and ENCODING as '0'..

I'll say it again. You should not use the word 'try' when talking about selecting a CCSID. You must either know what the CCSID is, or the sending application must tell you what it is via the header.
Guessing is dangerious - see my previous post on this thread for the reason why.

paranoid221

Posted: Sun Jun 13, 2010 10:15 pm Post subject:

Centurion

Joined: 03 Apr 2006
Posts: 101
Location: USA

In such scenarios, I'd almost always use the "Convert" option checked in the advanced properties tab to let MQ do the conversion for me.
That step would take care of converting the bitstream from the source charset to the platform's native charset where your QMgr/broker is running. The XML transcoding (be it UTF-8 or UTF-16) comes into picture only after the conversion I mentioned above is successful.
In your case, the first step itself might be failing because of the presence of a character in the source bitstream that doesnot have an equivalent in the target system's charset. Like others who have replied thus far mentioned, a lot does depend on how truthful the sending application is about the data that they are putting over the wire.
You should also look at your data in HEX mode to understand more about the data(both before and after convert).
You should really pay great attention to detail at every step while working with international/special characters.

fjb_saper wrote:

fjb_saper
For flows triggered by MQ nodes, isn't it more advisable to use the values from the MQMD header than the properties header? The values contained for these fields in both the headers are usually same in most cases though but there is a section in the infocenter which talks about which values take precedence when which led me to believe so. Pardon my oversight if I was wrong.
_________________
LIFE is a series of complex calculations, somewhere multiplied by ZERO.

kimbert

Posted: Mon Jun 14, 2010 1:41 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

In such scenarios, I'd almost always use the "Convert" option checked in the advanced properties tab to let MQ do the conversion for me.

Using the 'Convert' option is hardly ever the correct answer.

Quote:

That step would take care of converting the bitstream from the source charset to the platform's native charset

What is the 'source char set' in this scenario? What will sumanth84's message flow use as it's source char set?

Unless I'm missing something, your suggestion has not solved the problem - it has just moved it from the input node to the (unnecessary) Convert stage.

paranoid221

Posted: Mon Jun 14, 2010 7:17 am Post subject:

Centurion

Joined: 03 Apr 2006
Posts: 101
Location: USA

kimbert wrote:

What is the 'source char set' in this scenario? What will sumanth84's message flow use as it's source char set?

The source charset for his message flow would be dictated by the CCSID of the input message. Which is why I mentioned that the sending app has to be truthful about the data it is sending across. If the sender lies about the data being sent, hard to build a safety net on the downstream applications.

I use convert option when I know my downstream system is on the same platform as my broker is on. If there is a character in the source bitstream that does not have an equivalent on the destination, the convert would either fail on broker itself or the character in question would be replaced by a substitution character(typically x1A) which would render the XML invalid since x0A falls outside the allowable character range according to XML specs. Either way that would point me to suspect the data being sent into broker.
_________________
LIFE is a series of complex calculations, somewhere multiplied by ZERO.

Last edited by paranoid221 on Mon Jun 14, 2010 3:08 pm; edited 1 time in total

paranoid221

Posted: Mon Jun 14, 2010 8:08 am Post subject:

Centurion

Joined: 03 Apr 2006
Posts: 101
Location: USA

OP,
Please share your input message if & when possible
_________________
LIFE is a series of complex calculations, somewhere multiplied by ZERO.

kimbert

Posted: Mon Jun 14, 2010 8:14 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Sorry - I still don't see what advantage you get from using the Convert option. Why not just allow the flow to pick up the CCSID from the header in the normal way?

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing XML message Having Chinese Charecters in WMB

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP