Author |
Message
|
er_pankajgupta84 |
Posted: Thu Jul 23, 2009 8:32 am Post subject: Problem parsing special character thru XMLNSC parser |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Hi,
I have a problem parsing special characters (Latin) in XML. I have simplified my flow to the following:
MQinput -> RCD Node -> MqOutput.
The rcd node has xmlnsc parsing set and rest everying is default.
I am sending the following message thru MQinput node and it is failing at the RCD node.
<hi><SGTXT>Gratts ch/out May L GILL CAFÃ SPRING SUMMER MENU C</SGTXT></hi>
I have read various post and tried changing the ccsid property of the properties folder in the debug mode but nothing works.
I have tried following ccsid -> 437 (default), 273, 1208, 1200, 819.
Can this parsing be done by making a change in the ccsid only? If so can anybody suggest some other ccsid or any other change i can try on the properties folder or mdmq folder to get my XML parsed. |
|
Back to top |
|
 |
WMBDEV1 |
Posted: Thu Jul 23, 2009 9:33 am Post subject: Re: Problem parsing special character thru XMLNSC parser |
|
|
Sentinel
Joined: 05 Mar 2009 Posts: 888 Location: UK
|
er_pankajgupta84 wrote: |
I have a problem |
Whats the problem? Do you have extracts from a user trace you can share with us?
Are you sure the message is the correct ccsid that you think it is? How did you confirm this? |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Thu Jul 23, 2009 11:55 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
I am getting an xml parser exception and if i remove that special latin character from my input then i am able to parse this xml.
Another thing when you copy
<hi><SGTXT>Gratts ch/out May L GILL CAFÃ SPRING SUMMER MENU C</SGTXT></hi>
in notepad and save it as xml and try to open it in IE then it won't display an xml structure.
but if you save this
<?xml version="1.0" encoding="ISO-8859-1" ?>
<hi><SGTXT>Gratts ch/out May L GILL CAFÃ SPRING SUMMER MENU C</SGTXT></hi>
then you can see the xml in IE.
So i am sure it is something to do with encoding or ccsid. Can some one suggest some ccsid which i can try.. |
|
Back to top |
|
 |
WMBDEV1 |
Posted: Thu Jul 23, 2009 1:31 pm Post subject: |
|
|
Sentinel
Joined: 05 Mar 2009 Posts: 888 Location: UK
|
er_pankajgupta84 wrote: |
Can some one suggest some ccsid which i can try.. |
And the answer that you will get to this request is to find out what ccsid it has been sent in.
The XML is marked as being ISO-8859-1, the correct ccsid for this is 819. I'd confirm that the byte value of the offending character is what it should be for ISO-8859-1. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Thu Jul 23, 2009 3:03 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Thanks for you reply..
I have already tried ccsid as 819..but it did not work. Is there anything else do i need to specify...like encoding or some other property...? |
|
Back to top |
|
 |
kimbert |
Posted: Fri Jul 24, 2009 1:42 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
The RCD node will use InputRoot.Properties.CodedCharSetId when it re-parses the message. Please either
a) take a user trace, which will reveal the exact code page that the XMLNSC parser is using
or
b) insert a Trace node just before the RCD node with pattern set to '${Root}'.
As I keep saying, you cannot solve problems like this by fiddling about with settings - you need to properly understand the problem before you can fix it *reliably*.
One more thing. The debugger is not the best tool for trying to diagnose a parsing problem - it gets in the way. User Trace and trace nodes are the best way to go. |
|
Back to top |
|
 |
rekarm01 |
Posted: Mon Jul 27, 2009 12:29 am Post subject: Re: Problem parsing special character thru XMLNSC parser |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
er_pankajgupta84 wrote: |
I am sending the following message thru MQinput node and it is failing at the RCD node.
<hi><SGTXT>Gratts ch/out May L GILL CAFÃ SPRING SUMMER MENU C</SGTXT></hi> |
The message appears to be corrupted; it contains non-valid XML characters. It's impossible to tell where the message was corrupt, or whether it was corrupted further by posting it.
er_pankajgupta84 wrote: |
I have read various post and tried changing the ccsid property of the properties folder in the debug mode but nothing works. |
This topic may provide useful suggestions. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Mon Jul 27, 2009 5:24 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Thanks for your replies...
I know the input message is not a valid xml for UTF-8 encoding. But when you specify ISO-8859 encoding then it appear to be a valid xml.
I agree with kimbert's comments that.. you should know the cause of problem. The problem here is that our message flow may receive data that is in Latin or French in xml format. So i am trying to parse it with appropriate ccsid so that we don't need to write additional java code to deal with similar problems.
I will post the user trace.. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Mon Jul 27, 2009 8:05 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
I have find an alternative solution to my problem i.e. setting the element as opaque and then mapping it by invoking a java function in the mapper. But that is not the optimal solution.
We should be able to parse such characters using different ccsid or encodings. I have read few articles on encoding but haven't find a proper ccsid or encoding value for parsing that element.
I would appreciate if some one can try to parse that input as XML using some ccsid.
Thanks |
|
Back to top |
|
 |
smdavies99 |
Posted: Mon Jul 27, 2009 8:44 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Did you follow the suggestions of Kimbert?
This will really help you solve the problem.
Many of us are faced with this sort of problem almost on a daily basis and with the information that following Kimbert's suggestions would give us, will really help you solve the problem properly and not by some workaround/kludge/bodge. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Mon Jul 27, 2009 4:32 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Thanks to Kimbert..
I found the origin of the problem.
The problem is with the end system..which is not able to send proper characters. Now i need some more tutorials on how this encoding thing goes..
.
How different code sets are related to one another..is UTF-8 or 16 is the supper set of all the code sets..? and many similar questions
I am researching on it..
If anybody can point out some good artice that might help then it would be great..
http://www.joelonsoftware.com/articles/Unicode.html
was quite useful.. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jul 28, 2009 8:33 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Read Wikipedia's entry about Unicode...it's pretty good. |
|
Back to top |
|
 |
jbanoop |
Posted: Tue Jul 28, 2009 9:25 am Post subject: |
|
|
Chevalier
Joined: 17 Sep 2005 Posts: 401 Location: SC
|
|
Back to top |
|
 |
nagarjun_vv |
Posted: Tue Jul 28, 2009 7:04 pm Post subject: |
|
|
Apprentice
Joined: 24 Jun 2008 Posts: 33
|
Use the ccsid 937 and try onc i think this should work. |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue Jul 28, 2009 11:42 pm Post subject: Re: Problem parsing special character thru XMLNSC parser |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
er_pankajgupta84 wrote: |
I have find an alternative solution to my problem i.e. setting the element as opaque and then mapping it by invoking a java function in the mapper. But that is not the optimal solution. |
"Solution" is not at all the right word here.
nagarjun_vv wrote: |
Use the ccsid 937 and try onc i think this should work. |
Traditional Chinese Mixed-Byte EBCDIC? That's probably a step in the wrong direction. |
|
Back to top |
|
 |
|