Author |
Message
|
satya2481 |
Posted: Thu Dec 08, 2011 3:41 am Post subject: WMB V6.1 - Invalid Character in XML Message Issue |
|
|
Disciple
Joined: 26 Apr 2007 Posts: 170 Location: Bengaluru
|
Hi All,
I am again back with some issue...
Back ground : There is a flow running in Production environment in V5 Broker. This flow is now migrated and deployed into V6.1 Broker.
Issue : XML message sent to V5 flow working fine. Same message if sent to V6 flow its throwing "Invalid character (Unicode: 0x1A)".
After checking which character causing the problem.. its a "->" mark in the XML message for one of the field value. I think its a bullet.
Is there any difference in the way V5 broker parses the message for XML domain and V6 broker.
How to fix this issue. Because we have to upgrade the flow and it should work fine how it was working in V5 broker in Production environment.
Thanks
Satya |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Dec 08, 2011 3:45 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
The parser in v5 was not as correct as the parser in v6.1 and etc.
You are actually using the XMLNSC parser in v6.1?
What I'm saying is that it's likely that the only reason it "worked just fine" in v5 is because it incorrectly accepted your incorrect XML documents. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Dec 08, 2011 4:07 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
mqjeff is correct - this looks very much like a defect that has been fixed. According to the XML specifiction http://www.w3.org/TR/2006/REC-xml-20060816/#charsets the allowed characters are:
Code: |
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ |
So 0x1A is not allowed. v5 was non-compliant. |
|
Back to top |
|
 |
satya2481 |
Posted: Thu Dec 08, 2011 10:04 pm Post subject: |
|
|
Disciple
Joined: 26 Apr 2007 Posts: 170 Location: Bengaluru
|
Thank you very much for the information...
So is there any solution to resolve this kind of issues.
Should we replace such characters from the code ? Or any other alternative...
Thank You
Satya |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Dec 08, 2011 10:09 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
satya2481 wrote: |
Thank you very much for the information...
So is there any solution to resolve this kind of issues.
Should we replace such characters from the code ? Or any other alternative...
Thank You
Satya |
You need to find the producer of the message and determine the CCSID in which it is sent. This CCSID needs to support all the characters the producer is likely to send. (Hopefully you'll find the CCSID to be 1208 (UTF-8 ))
Then make sure no conversion happens until the broker reads the message.(DO NOT set the conversion flag on the MQInput node)
Verify also that on the outbound there is no CCSID specified that would force a substitution character for a non mapped character. (Make it easy set output CCSID to 1208).
Have fun  _________________ MQ & Broker admin
Last edited by fjb_saper on Thu Dec 08, 2011 10:11 pm; edited 1 time in total |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Dec 08, 2011 10:10 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
I've seen this character appear when people have cut/pasted a Windows screen.
As you say, the character seems to represent a bulllet point.
The only way (Apart from stopping this appearing in the first place) is to scan the message before it is parsed and replace the offending character with something that does not fall foul of the XMLNSC parser(other parsers available). _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kimbert |
Posted: Fri Dec 09, 2011 2:03 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
0x1A is often used as a 'substitution character' by character encoders ( ICU being the most common one ). In other words, 0x1A is the character that is output when the source string contains a Unicode character for which the output CCSID does not have a mapping.
If that guess is correct, the ideal fix would be to change the upstream ( sending ) application to use UTF-8 instead of whatever they're currently using. UTF-8 has a mapping for every Unicode character, so will never get into this hole. Depends on whether the upstream application can be changed, of course. |
|
Back to top |
|
 |
|