MQSeries.net :: View topic - Invalid XML characters

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Invalid XML characters

Invalid XML characters

« View previous topic :: View next topic »

Author

Message

murdeep

Posted: Thu Feb 11, 2010 8:04 am Post subject: Invalid XML characters

Master

Joined: 03 Nov 2004
Posts: 211

Hello, this is probably a lame question but here goes.

Running WMB 6.1.0.5 and WMQ V7 on W2K.

We have a message flow that receives XML. Every once in awhile we receive a message that gets rejected by the XMLNSC parser due to invalid XML character, in one case a 0x7F.

I inform the sending application team and they claim that it is valid and that our parser is not configured correctly. I asked them then if they validate the XML before putting it on the wire and they said yes. So I said great send me your schemas and I will use them to validate what you send me when I receive the message.

I imported the schemas created a message set, build a simple flow that uses the MRM parser and does a Trace ${Root} to force parsing of the entire message and as I expected the message is rejected for invalid XML character.

The app team is positive they validate.

Is it possible to have a 0x7F character in an XML document and be a valid XML document? Is there some option when configuring WMB to have it not reject XML that has these odd characters?

Thanks in advance.

mqjeff

Posted: Thu Feb 11, 2010 8:27 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Try validating with XMLNSC instead of MRM-XML...

Vitor

Posted: Thu Feb 11, 2010 8:30 am Post subject: Re: Invalid XML characters

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

murdeep wrote:

Is it possible to have a 0x7F character in an XML document and be a valid XML document? Is there some option when configuring WMB to have it not reject XML that has these odd characters?

Isn't that a DEL character? Hardly the sort of thing you've have in XML I think!

The XML spec is rather nicely ambiguous on this:

Quote:

Document authors are encouraged to avoid "compatibility characters"

but that value is in the range of allowable XML characters.

2 questions occur to me:

1) Why do they have a DEL in the XML
2) What are they using to validate the XML? Is it a commerical tool or homebrew? Does it open in IE?

How WMB can be convinced to handle this I'll leave to better minds than mine.

_________________
Honesty is the best policy.
Insanity is the best defence.

murdeep

Posted: Thu Feb 11, 2010 9:38 am Post subject:

Master

Joined: 03 Nov 2004
Posts: 211

mqjeff wrote:

Try validating with XMLNSC instead of MRM-XML...

Ok, tried this and it fails with a slightly different error message but result is same - message rejected for invalid xml character.

Vitor wrote:

2 questions occur to me:

1) Why do they have a DEL in the XML
2) What are they using to validate the XML? Is it a commerical tool or homebrew? Does it open in IE?

1) Good question, haven't got a good explanation of why they are sending us this char
2) they are an ASP and are using VB code XMLReader class I believe to validate. We are looking at if perhaps they are only validating structure and not content.

Vitor

Posted: Thu Feb 11, 2010 10:34 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

murdeep wrote:

2) they are an ASP and are using VB code XMLReader class I believe to validate. We are looking at if perhaps they are only validating structure and not content.

Ok, not my strong suit but can you validate the structure of an XML document without parsing the content? Even in VB?

Weird, but I've seen weirder.

Have you tried opening one of these documents with IE or XMLSpy? IIRC WMB uses (or used to use) a variation of the Xerces parser which is fairly standard. Again, someone who knows more about this will be along in a minute.
_________________
Honesty is the best policy.
Insanity is the best defence.

murdeep

Posted: Thu Feb 11, 2010 10:41 am Post subject:

Master

Joined: 03 Nov 2004
Posts: 211

Vitor wrote:

Have you tried opening one of these documents with IE or XMLSpy? IIRC WMB uses (or used to use) a variation of the Xerces parser which is fairly standard. Again, someone who knows more about this will be along in a minute.

Saved the message to a .xml file and then opened with IE and the 0x7F appears as a little box. ---> hadnt <---

Not sure if IE is as strict as the WMB parsers.

Vitor

Posted: Thu Feb 11, 2010 12:00 pm Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

murdeep wrote:

Not sure if IE is as strict as the WMB parsers.

It's still odd that that value would appear in that position - you would expect a ' given the context.

Be that as it may, the answer will only come from someone with a detailed knowledge of the WMB parser. Who isn't me.
_________________
Honesty is the best policy.
Insanity is the best defence.

murdeep

Posted: Thu Feb 11, 2010 2:40 pm Post subject:

Master

Joined: 03 Nov 2004
Posts: 211

Ok, we may be getting to the bottom of this. Some more info.

The sending app is a VB client on windows connecting to a MQServer on windows.

The MQServer qmgr CCSID is 437.

The client VB app sets it's Message.CCSID = 437 and then build it's payload. In the payload they set a character to ASCII 180 which is a apostrophe. I'm thinking that ASCII 180 is not a valid character in CCSID 437. I'm thinking that the client should be setting the Message.CCSID to 1208 for UTF-8 so that it supports the ASCII 180 character before sending the message.

Can anyone comment on this? Thanks

kimbert

Posted: Thu Feb 11, 2010 3:15 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

If you have not yet referred to this document, then you should do so: http://www.w3.org/TR/2006/REC-xml-20060816/#charsets
I would advise you to quote it in all conversations with other departments about the well-formedness of XML

0x7F is a legal Unicode code point in an XML document.
In your XML documents, the byte value '0x7F' might or might not represent a legal XML code point - that will depend on the encoding ( the CCSID ) of your XML document.
The issue below seems different from the 0x7F question, btw.

Quote:

The client VB app sets it's Message.CCSID = 437 and then build it's payload. In the payload they set a character to ASCII 180 which is a apostrophe. I'm thinking that ASCII 180 is not a valid character in CCSID 437. I'm thinking that the client should be setting the Message.CCSID to 1208 for UTF-8 so that it supports the ASCII 180 character before sending the message.

Apostrophe is not represented by a byte value of 180. It has a byte value of 039 ( 0x27 ) in ASCII. You're right about the mapping to code page 437, though. There's no apostrophe in 437.

And yes, I think UTF-8 would be a great choice of encoding.

murdeep

Posted: Thu Feb 11, 2010 3:42 pm Post subject:

Master

Joined: 03 Nov 2004
Posts: 211

kimbert wrote:

Isn't 0x27 technically a single quotation mark? and ASCII 180 an apostrophe?

No apostrophe in 437.

I think I get this now. The developer claims his XML is valid because he validates which works, they then serialize it to a string which is then written to the message buffer with a WriteString method where the CCSID is set to 437. The ASCII 180 somehow becomes a 0x7F and fails parsing at WMB.

I'll get the developer to set his CCSID to 1208 and then see what happens when the message arrives at WMB.

One last point this message utlimately ends up at a z/OS node. Will there be any issues converting frrom 1208 to the z/OS CCSID whatever that may be?

kimbert

Posted: Thu Feb 11, 2010 4:43 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

One last point this message utlimately ends up at a z/OS node. Will there be any issues converting frrom 1208 to the z/OS CCSID whatever that may be?

z/OS can handle UTF-8 data ( 1208 ). If 'whatever that may be' is not a Unicode code page, then yes, there may well be problems. But that's always the problem when you use non-Unicode code pages.

rekarm01

Posted: Thu Feb 11, 2010 6:48 pm Post subject: Re: Invalid XML characters

Grand Master

Joined: 25 Jun 2008
Posts: 1415

murdeep wrote:

we receive a message that gets rejected by the XMLNSC parser due to invalid XML character, in one case a 0x7F.

What was the actual error message?

0x7F is a byte, not a character. Whether it maps to a valid XML character or not depends on the ccsid that maps it.

murdeep wrote:

Isn't 0x27 technically a single quotation mark? and ASCII 180 an apostrophe?

ASCII is a 7-bit character set; there is no ASCII 180. 0x27 is a byte, not a character; without a ccsid, it means nothing:

ccsid=437 (MS-DOS):
ccsid=819 (ISO 8859-1):
ccsid=1208 (UTF-8):

For ccsid=437, 0x7F is ambiguous; it can either be interpreted as a control character or a graphic character, depending on the context. This, among other reasons, makes it a poor choice for representing non-ASCII character data.

Most likely, the sender used a non-ASCII acute accent instead of an ASCII apostrophe in the XML element "...hadnÂ´t..."; the sender then converted the message to ccsid=437, which does not have an acute accent, so the conversion replaced the acute accent with a substitution character. The substitution character is not a valid XML character.

The sender corrupted the message before it sent it; even if the broker were to accept it, the receiver would still get a bad message.

murdeep wrote:

I'll get the developer to set his CCSID to 1208 and then see what happens when the message arrives at WMB.

Unicode is a good choice, but it won't fix Garbage In, Garbage Out.

murdeep wrote:

One last point this message utlimately ends up at a z/OS node. Will there be any issues converting from 1208 to the z/OS CCSID whatever that may be?

Unicode is best used end-to-end. If the source message contains characters not supported by the target ccsid, there will be issues.

murdeep

Posted: Fri Feb 12, 2010 8:03 am Post subject:

Master

Joined: 03 Nov 2004
Posts: 211

Ok, had the developer change the CCSID specified prior to the WriteString methid call to CCSID 852 which supports the ASCII 180 and WMB validates the XML no problem. But as I suspected when the message arrives at the z/OS qmgr which is CCSID 37 we run into issues.

So we will try CCSID 1208. My question is when the z/OS app does a get with convert do they have to do anything prior to doing the MQGet. I ask because by defaut won't the conversion be 1208->37? Won't this fail as well?

rekarm01

Posted: Wed Feb 17, 2010 2:17 am Post subject: Re: Invalid XML characters

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Using an acute accent character in "...hadnÂ´t..." is a typographical error. It might be worthwhile to have the sender fix that.

murdeep wrote:

Ok, had the developer change the CCSID specified prior to the WriteString method call to CCSID 852 which supports the ASCII 180 ...

ASCII is still a 7-bit character set; there is still no ASCII 180. (For ccsid=852, the acute accent actually maps to #239, not #180.)

MS-DOS ccsids are useful for backwards compatibility with DOS applications, but they don't always convert well to/from non-DOS ccsids. Try to avoid them, unless the source or target application is actually running under DOS:

437 MS-DOS Latin US
737 MS-DOS Greek
775 MS-DOS Baltic Rim
850 MS-DOS Latin 1
851 MS-DOS Greek 1
852 MS-DOS Latin 2
855 MS-DOS Cyrillic
857 MS-DOS Turkish
860 MS-DOS Portuguese
861 MS-DOS Icelandic
862 MS-DOS Hebrew
863 MS-DOS French Canada
865 MS-DOS Nordic
866 MS-DOS Cyrillic CIS 1
869 MS-DOS Greek 2

murdeep wrote:

But as I suspected when the message arrives at the z/OS qmgr which is CCSID 37 we run into issues.

So we will try CCSID 1208.

Converting to Unicode is much less useful, when it's undone by a subsequent conversion from Unicode. If the target ccsid is 37, why not have the developer set ccsid=37 directly? That way, the broker and target application don't need to convert at all. Any conversion issues can be detected at the source, and dealt with there.

murdeep wrote:

My question is when the z/OS app does a get with convert do they have to do anything prior to doing the MQGet.

What could any application possibly do to a message prior to the MQGet? Some issues can only be fixed by the sender.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Invalid XML characters

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP