ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing XML message Having Chinese Charecters in WMB

Post new topic  Reply to topic
 Parsing XML message Having Chinese Charecters in WMB « View previous topic :: View next topic » 
Author Message
sumanth84
PostPosted: Tue Jun 08, 2010 6:53 pm    Post subject: Parsing XML message Having Chinese Charecters in WMB Reply with quote

Guest




Hi,

I am trying to parse a XML message with Chinese Characters. With respect to guidelines from Client we have to use a framework for building all WMB related services.

Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser. I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

The problem is when I parse the blob there is some other charecters before the first byte that is "<". (for Ex: fffe). Hence the parsing is failing.

WMB 6.1.0.4

Can anyone please provide some inputs?

Thanks
Back to top
rekarm01
PostPosted: Tue Jun 08, 2010 7:55 pm    Post subject: Re: Parsing XML message Having Chinese Characters in WMB Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

sumanth84 wrote:
Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser.

What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?

sumanth84 wrote:
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

Tried where? In an MQ header, before the MQInput node? In the MQInput node "Convert ccsid" property? In a compute node after the MQInput node?

If the input ccsid is wrong before the MQInput node, it's better to have the sender fix it, rather than try to fix it in the message flow.

sumanth84 wrote:
The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe).

X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.

If that doesn't work, then provide more details.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Tue Jun 08, 2010 8:20 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Also you did not specify the value of InputRoot.Properties.CodedCharSet.

This is really the value you should use for parsing. You might have to verify with header chaining and use the value on the last header...

Now if the content does not match the description (like ccsid = 1208 but the content is in 1204 due to order byte) then you need to push back to the message producer and force him to describe his message correctly.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
kimbert
PostPosted: Tue Jun 08, 2010 11:56 pm    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.
You are doing it wrong. The sender should supply the correct CCSID for the message in the MQMD header ( or whatever the final header is ). If the sender is not setting the CCSID in a header, then they *must* tell you which CCSID they have used, so that your message flow can parse the data correctly.
In general, it is not safe to guess the CCSID - you will usually end up choosing one which works for your test data, but fails with real production data.
Back to top
View user's profile Send private message
sumanth84
PostPosted: Wed Jun 09, 2010 8:13 am    Post subject: Re: Parsing XML message Having Chinese Characters in WMB Reply with quote

Guest




[quote="rekarm01"]
sumanth84 wrote:
Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser.

[quote="rekarm01"]
What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?

Due to framework constraints all messages are taken as BLOB from the MQInput Node. Later the message is parsed against the parser depending on the requirement.

sumanth84 wrote:
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

rekarm01 wrote:

Tried where? In an MQ header, before the MQInput node? In the MQInput node "Convert ccsid" property? In a compute node after the MQInput node?

If the input ccsid is wrong before the MQInput node, it's better to have the sender fix it, rather than try to fix it in the message flow.

I tried to parse the message in Compute node that is after MQInput Node.
I will ask for the provider to share the CCSID and try.

sumanth84 wrote:
The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe).

X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.

If that doesn't work, then provide more details.
Quote:
Back to top
sumanth84
PostPosted: Thu Jun 10, 2010 6:45 pm    Post subject: Re: Parsing XML message Having Chinese Characters in WMB Reply with quote

Guest




[quote="sumanth84"][quote="rekarm01"]
sumanth84 wrote:
Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser.

rekarm01 wrote:

What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?

Due to framework constraints all messages are taken as BLOB from the MQInput Node. Later the message is parsed against the parser depending on the requirement.

sumanth84 wrote:
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.

rekarm01 wrote:

Tried where? In an MQ header, before the MQInput node? In the MQInput node "Convert ccsid" property? In a compute node after the MQInput node?

If the input ccsid is wrong before the MQInput node, it's better to have the sender fix it, rather than try to fix it in the message flow.

I tried to parse the message in Compute node that is after MQInput Node.
I will ask for the provider to share the CCSID and try.

sumanth84 wrote:
The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe).

X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.

If that doesn't work, then provide more details.
Quote:



Hi,

I have tried with CCSID 1204.The Input message is not able to parse in XMLNSC domain.

The Flow is:

MQInputNode->Compute Node ->MQOutputNode

The Input message will be taken as BLOB and parsed in Compute node using PARSE statement.

If i read the input message using RFHUtil , CCSID is set as 437.

Please give your comments with regard to CCSID to be used in this regard.
Back to top
fjb_saper
PostPosted: Thu Jun 10, 2010 7:17 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

I doubt very much that CCSID 437 will support your Chinese characters.
You need to really force the sending application to specify the CCSID of the data the correct way. Then you must make sure that there is no translation happening on the channels.

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
SANTYP
PostPosted: Thu Jun 10, 2010 10:54 pm    Post subject: Reply with quote

Centurion

Joined: 27 Mar 2007
Posts: 142

Typically when they are not specified the CCSID which they are sending ..
I think u can try with CCSID as '0' and ENCODING as '0'..
which will parse the message.

use the below statement to parse the message.

CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') PARSE(InputRoot.BLOB.BLOB ENCODING 0 CCSID 0);
Back to top
View user's profile Send private message
kimbert
PostPosted: Fri Jun 11, 2010 1:18 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
I think u can try with CCSID as '0' and ENCODING as '0'..
I'll say it again. You should not use the word 'try' when talking about selecting a CCSID. You must either know what the CCSID is, or the sending application must tell you what it is via the header.
Guessing is dangerious - see my previous post on this thread for the reason why.
Back to top
View user's profile Send private message
paranoid221
PostPosted: Sun Jun 13, 2010 10:15 pm    Post subject: Reply with quote

Centurion

Joined: 03 Apr 2006
Posts: 101
Location: USA

In such scenarios, I'd almost always use the "Convert" option checked in the advanced properties tab to let MQ do the conversion for me.
That step would take care of converting the bitstream from the source charset to the platform's native charset where your QMgr/broker is running. The XML transcoding (be it UTF-8 or UTF-16) comes into picture only after the conversion I mentioned above is successful.
In your case, the first step itself might be failing because of the presence of a character in the source bitstream that doesnot have an equivalent in the target system's charset. Like others who have replied thus far mentioned, a lot does depend on how truthful the sending application is about the data that they are putting over the wire.
You should also look at your data in HEX mode to understand more about the data(both before and after convert).
You should really pay great attention to detail at every step while working with international/special characters.

fjb_saper wrote:
Also you did not specify the value of InputRoot.Properties.CodedCharSet.

This is really the value you should use for parsing. You might have to verify with header chaining and use the value on the last header...


fjb_saper
For flows triggered by MQ nodes, isn't it more advisable to use the values from the MQMD header than the properties header? The values contained for these fields in both the headers are usually same in most cases though but there is a section in the infocenter which talks about which values take precedence when which led me to believe so. Pardon my oversight if I was wrong.
_________________
LIFE is a series of complex calculations, somewhere multiplied by ZERO.
Back to top
View user's profile Send private message
kimbert
PostPosted: Mon Jun 14, 2010 1:41 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
In such scenarios, I'd almost always use the "Convert" option checked in the advanced properties tab to let MQ do the conversion for me.
Using the 'Convert' option is hardly ever the correct answer.
Quote:
That step would take care of converting the bitstream from the source charset to the platform's native charset
What is the 'source char set' in this scenario? What will sumanth84's message flow use as it's source char set?

Unless I'm missing something, your suggestion has not solved the problem - it has just moved it from the input node to the (unnecessary) Convert stage.
Back to top
View user's profile Send private message
paranoid221
PostPosted: Mon Jun 14, 2010 7:17 am    Post subject: Reply with quote

Centurion

Joined: 03 Apr 2006
Posts: 101
Location: USA

kimbert wrote:
What is the 'source char set' in this scenario? What will sumanth84's message flow use as it's source char set?


The source charset for his message flow would be dictated by the CCSID of the input message. Which is why I mentioned that the sending app has to be truthful about the data it is sending across. If the sender lies about the data being sent, hard to build a safety net on the downstream applications.

I use convert option when I know my downstream system is on the same platform as my broker is on. If there is a character in the source bitstream that does not have an equivalent on the destination, the convert would either fail on broker itself or the character in question would be replaced by a substitution character(typically x1A) which would render the XML invalid since x0A falls outside the allowable character range according to XML specs. Either way that would point me to suspect the data being sent into broker.
_________________
LIFE is a series of complex calculations, somewhere multiplied by ZERO.


Last edited by paranoid221 on Mon Jun 14, 2010 3:08 pm; edited 1 time in total
Back to top
View user's profile Send private message
paranoid221
PostPosted: Mon Jun 14, 2010 8:08 am    Post subject: Reply with quote

Centurion

Joined: 03 Apr 2006
Posts: 101
Location: USA

OP,
Please share your input message if & when possible
_________________
LIFE is a series of complex calculations, somewhere multiplied by ZERO.
Back to top
View user's profile Send private message
kimbert
PostPosted: Mon Jun 14, 2010 8:14 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Sorry - I still don't see what advantage you get from using the Convert option. Why not just allow the flow to pick up the CCSID from the header in the normal way?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing XML message Having Chinese Charecters in WMB
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.