Author |
Message
|
sumanth84 |
Posted: Tue Jun 08, 2010 6:53 pm Post subject: Parsing XML message Having Chinese Charecters in WMB |
|
|
Guest
|
Hi,
I am trying to parse a XML message with Chinese Characters. With respect to guidelines from Client we have to use a framework for building all WMB related services.
Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser. I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success.
The problem is when I parse the blob there is some other charecters before the first byte that is "<". (for Ex: fffe). Hence the parsing is failing.
WMB 6.1.0.4
Can anyone please provide some inputs?
Thanks |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue Jun 08, 2010 7:55 pm Post subject: Re: Parsing XML message Having Chinese Characters in WMB |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
sumanth84 wrote: |
Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser. |
What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?
sumanth84 wrote: |
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success. |
Tried where? In an MQ header, before the MQInput node? In the MQInput node "Convert ccsid" property? In a compute node after the MQInput node?
If the input ccsid is wrong before the MQInput node, it's better to have the sender fix it, rather than try to fix it in the message flow.
sumanth84 wrote: |
The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe). |
X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.
If that doesn't work, then provide more details. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Jun 08, 2010 8:20 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Also you did not specify the value of InputRoot.Properties.CodedCharSet.
This is really the value you should use for parsing. You might have to verify with header chaining and use the value on the last header...
Now if the content does not match the description (like ccsid = 1208 but the content is in 1204 due to order byte) then you need to push back to the message producer and force him to describe his message correctly.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jun 08, 2010 11:56 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success. |
You are doing it wrong. The sender should supply the correct CCSID for the message in the MQMD header ( or whatever the final header is ). If the sender is not setting the CCSID in a header, then they *must* tell you which CCSID they have used, so that your message flow can parse the data correctly.
In general, it is not safe to guess the CCSID - you will usually end up choosing one which works for your test data, but fails with real production data. |
|
Back to top |
|
 |
sumanth84 |
Posted: Wed Jun 09, 2010 8:13 am Post subject: Re: Parsing XML message Having Chinese Characters in WMB |
|
|
Guest
|
[quote="rekarm01"]
sumanth84 wrote: |
Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser. |
[quote="rekarm01"]
What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?
Due to framework constraints all messages are taken as BLOB from the MQInput Node. Later the message is parsed against the parser depending on the requirement.
sumanth84 wrote: |
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success. |
rekarm01 wrote: |
Tried where? In an MQ header, before the MQInput node? In the MQInput node "Convert ccsid" property? In a compute node after the MQInput node?
If the input ccsid is wrong before the MQInput node, it's better to have the sender fix it, rather than try to fix it in the message flow.
I tried to parse the message in Compute node that is after MQInput Node.
I will ask for the provider to share the CCSID and try.
sumanth84 wrote: |
The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe). |
X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.
If that doesn't work, then provide more details. |
|
|
Back to top |
|
 |
sumanth84 |
Posted: Thu Jun 10, 2010 6:45 pm Post subject: Re: Parsing XML message Having Chinese Characters in WMB |
|
|
Guest
|
[quote="sumanth84"][quote="rekarm01"]
sumanth84 wrote: |
Here the input message from MQInput Node is coming as BLOB and I am trying to parse the same against a XMLNSC parser. |
rekarm01 wrote: |
What does that mean? Is the "BLOB" domain coming from the input message, or from the MQInput node? Why not parse as "XMLNSC" directly?
Due to framework constraints all messages are taken as BLOB from the MQInput Node. Later the message is parsed against the parser depending on the requirement.
sumanth84 wrote: |
I have tried few CCSID like 1200,1386,1316 to parse and get an XMLNSC tree, but with out any success. |
rekarm01 wrote: |
Tried where? In an MQ header, before the MQInput node? In the MQInput node "Convert ccsid" property? In a compute node after the MQInput node?
If the input ccsid is wrong before the MQInput node, it's better to have the sender fix it, rather than try to fix it in the message flow.
I tried to parse the message in Compute node that is after MQInput Node.
I will ask for the provider to share the CCSID and try.
sumanth84 wrote: |
The problem is when I parse the blob there is some other characters before the first byte that is "<". (for Ex: fffe). |
X'fffe' is probably a byte-order-mark for UTF-16; the correct input ccsid would be 1204.
If that doesn't work, then provide more details. |
|
Hi,
I have tried with CCSID 1204.The Input message is not able to parse in XMLNSC domain.
The Flow is:
MQInputNode->Compute Node ->MQOutputNode
The Input message will be taken as BLOB and parsed in Compute node using PARSE statement.
If i read the input message using RFHUtil , CCSID is set as 437.
Please give your comments with regard to CCSID to be used in this regard. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jun 10, 2010 7:17 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
I doubt very much that CCSID 437 will support your Chinese characters.
You need to really force the sending application to specify the CCSID of the data the correct way. Then you must make sure that there is no translation happening on the channels.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
SANTYP |
Posted: Thu Jun 10, 2010 10:54 pm Post subject: |
|
|
 Centurion
Joined: 27 Mar 2007 Posts: 142
|
Typically when they are not specified the CCSID which they are sending ..
I think u can try with CCSID as '0' and ENCODING as '0'..
which will parse the message.
use the below statement to parse the message.
CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') PARSE(InputRoot.BLOB.BLOB ENCODING 0 CCSID 0); |
|
Back to top |
|
 |
kimbert |
Posted: Fri Jun 11, 2010 1:18 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I think u can try with CCSID as '0' and ENCODING as '0'.. |
I'll say it again. You should not use the word 'try' when talking about selecting a CCSID. You must either know what the CCSID is, or the sending application must tell you what it is via the header.
Guessing is dangerious - see my previous post on this thread for the reason why. |
|
Back to top |
|
 |
paranoid221 |
Posted: Sun Jun 13, 2010 10:15 pm Post subject: |
|
|
 Centurion
Joined: 03 Apr 2006 Posts: 101 Location: USA
|
In such scenarios, I'd almost always use the "Convert" option checked in the advanced properties tab to let MQ do the conversion for me.
That step would take care of converting the bitstream from the source charset to the platform's native charset where your QMgr/broker is running. The XML transcoding (be it UTF-8 or UTF-16) comes into picture only after the conversion I mentioned above is successful.
In your case, the first step itself might be failing because of the presence of a character in the source bitstream that doesnot have an equivalent in the target system's charset. Like others who have replied thus far mentioned, a lot does depend on how truthful the sending application is about the data that they are putting over the wire.
You should also look at your data in HEX mode to understand more about the data(both before and after convert).
You should really pay great attention to detail at every step while working with international/special characters.
fjb_saper wrote: |
Also you did not specify the value of InputRoot.Properties.CodedCharSet.
This is really the value you should use for parsing. You might have to verify with header chaining and use the value on the last header...
|
fjb_saper
For flows triggered by MQ nodes, isn't it more advisable to use the values from the MQMD header than the properties header? The values contained for these fields in both the headers are usually same in most cases though but there is a section in the infocenter which talks about which values take precedence when which led me to believe so. Pardon my oversight if I was wrong. _________________ LIFE is a series of complex calculations, somewhere multiplied by ZERO. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Jun 14, 2010 1:41 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
In such scenarios, I'd almost always use the "Convert" option checked in the advanced properties tab to let MQ do the conversion for me. |
Using the 'Convert' option is hardly ever the correct answer.
Quote: |
That step would take care of converting the bitstream from the source charset to the platform's native charset |
What is the 'source char set' in this scenario? What will sumanth84's message flow use as it's source char set?
Unless I'm missing something, your suggestion has not solved the problem - it has just moved it from the input node to the (unnecessary) Convert stage. |
|
Back to top |
|
 |
paranoid221 |
Posted: Mon Jun 14, 2010 7:17 am Post subject: |
|
|
 Centurion
Joined: 03 Apr 2006 Posts: 101 Location: USA
|
kimbert wrote: |
What is the 'source char set' in this scenario? What will sumanth84's message flow use as it's source char set?
|
The source charset for his message flow would be dictated by the CCSID of the input message. Which is why I mentioned that the sending app has to be truthful about the data it is sending across. If the sender lies about the data being sent, hard to build a safety net on the downstream applications.
I use convert option when I know my downstream system is on the same platform as my broker is on. If there is a character in the source bitstream that does not have an equivalent on the destination, the convert would either fail on broker itself or the character in question would be replaced by a substitution character(typically x1A) which would render the XML invalid since x0A falls outside the allowable character range according to XML specs. Either way that would point me to suspect the data being sent into broker. _________________ LIFE is a series of complex calculations, somewhere multiplied by ZERO.
Last edited by paranoid221 on Mon Jun 14, 2010 3:08 pm; edited 1 time in total |
|
Back to top |
|
 |
paranoid221 |
Posted: Mon Jun 14, 2010 8:08 am Post subject: |
|
|
 Centurion
Joined: 03 Apr 2006 Posts: 101 Location: USA
|
OP,
Please share your input message if & when possible _________________ LIFE is a series of complex calculations, somewhere multiplied by ZERO. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Jun 14, 2010 8:14 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Sorry - I still don't see what advantage you get from using the Convert option. Why not just allow the flow to pick up the CCSID from the header in the normal way? |
|
Back to top |
|
 |
|