|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Is the ccsid/encoding combinatie 1208/273 invalid ? |
« View previous topic :: View next topic » |
Author |
Message
|
Herbert |
Posted: Thu Sep 01, 2011 5:41 am Post subject: Is the ccsid/encoding combinatie 1208/273 invalid ? |
|
|
 Centurion
Joined: 05 Dec 2006 Posts: 146 Location: Leersum, The Netherlands
|
Hi,
A WMB flow of ours is getting XML messages with ccsid/encoding of 1208/273, this flow puts a XML message on the queue with ccsid/encodig of 1208/546 .
The problem is that this WMB flow changes the value of some xml elements, UTF encoding ( &#nnn; ) is replaced by the 1 byte actual value ( Ñ ).
The given ccsid/encoding values is a strange combination of Unicode 1208/546 and AIX 819/273. If the flow gets the same message in in the 'normal' unicode combination 1208/546 then it works fine. And that is also my request to the delivering party.
However I don't understand this. Is encoding relevant for XML messages with MQSTR as format ?
I know its very relevant when working whith decimal values in the MRM domain and you have cobol copy books as input/output, things like big endian vs little endian and so. In a UTF-8 XML message in MQSTR format with CCSID 1208 there are no decimal fields, so the encoding atribute is not relevant. What Im I missing?
Kind Regards, Herbert |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Sep 01, 2011 7:27 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
you're probably missing that you go from a multibyte CCSID 1208 to a single byte CCISD 819 where not all characters can be translated. In that case most conversion programs will attempt to replace the unknown char with a substitution char (hex 1A?) and you will end up with a malformed XML...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Sep 01, 2011 10:23 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
fjb_saper wrote: |
you're probably missing that you go from a multibyte CCSID 1208 to a single byte CCISD 819 |
 |
|
Back to top |
|
 |
rekarm01 |
Posted: Thu Sep 01, 2011 11:05 am Post subject: Re: Is the ccsid/encoding combinatie 1208/273 invalid ? |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Herbert wrote: |
However I don't understand this. Is encoding relevant for XML messages with MQSTR as format ?
I know its very relevant when working with decimal values in the MRM domain and you have cobol copy books as input/output, things like big endian vs little endian and so. In a UTF-8 XML message in MQSTR format with CCSID 1208 there are no decimal fields, so the encoding atribute is not relevant. |
That's correct. The MQ Header Encoding field (not to be confused with the XML encoding declaration) is not relevant in this case.
Herbert wrote: |
The problem is that this WMB flow changes the value of some xml elements, UTF encoding ( &#nnn; ) is replaced by the 1 byte actual value ( Ñ ). |
The XML standard requires XML processors to replace character references with literal characters, when parsing. It does not require XML processors to put them back when writing.
Herbert wrote: |
If the flow gets the same message in in the 'normal' unicode combination 1208/546 then it works fine. |
If the message flow does not modify the message, then it can pass the original, unparsed input message directly to the output, without invoking the parser to re-write it. |
|
Back to top |
|
 |
Herbert |
Posted: Fri Sep 02, 2011 1:58 am Post subject: |
|
|
 Centurion
Joined: 05 Dec 2006 Posts: 146 Location: Leersum, The Netherlands
|
fjb_saper wrote: |
you're probably missing that you go from a multibyte CCSID 1208 to a single byte CCISD 819 where not all characters can be translated. In that case most conversion programs will attempt to replace the unknown char with a substitution char (hex 1A?) and you will end up with a malformed XML...  |
yes and no
Yes, Multibyte to 1 byte translation, thas was indeed what was happening.
No, This was not our intention, and the output is still CCSID 1208, the message did go in WMB from 1208/273 to 1208/546.
My simple answer was: The sending MQMD, 1208/273, is wrong. Or use unicode 1208/546 or use the platform native (AIX 819/273 is this case), not this strange combination. And bec the content of the message is UTF-8, I adviced to use 1208/546.
And they acted on this statement from me, the JMS setup in the WAS server that was the originator of those messages is now changed, the messages are now put on the queue as 1208/546 and all works now.
However I think that my statement is wrong It should not matter, and rekarm01 confirms this:
rekarm01 wrote: |
That's correct. The MQ Header Encoding field (not to be confused with the XML encoding declaration) is not relevant in this case. |
So, still some RTFM to do ... |
|
Back to top |
|
 |
Herbert |
Posted: Fri Sep 02, 2011 5:59 am Post subject: |
|
|
 Centurion
Joined: 05 Dec 2006 Posts: 146 Location: Leersum, The Netherlands
|
ok, I think I understand it
The char Ñ
When given as HTML/XML escaping it is 6 chars: & #209;
When given as UTF-8 encoding it is 2 bytes: C3 91
Technically nothing goes wrong, WMB does work This is what is happening.
When we go from 1208/273 to 1208/546 (both with a changed and unchanged message in WMB)
A1) Input, XML/HMTL escaping, 6 chars, & #209;
A2) WMB processing
A3) Output, UTF-8 Encoding, 2 bytes, C3 91
When the message is not modified and both input/output are 1208/546
B1) Input, XML/HMTL escaping, 6 chars, & #209;
B2) WMB processing
B3) Output, XML/HMTL escaping, 6 chars, & #209;
When the message is modified and both input/output are 1208/546
C1) Input, XML/HMTL escaping, 6 chars, & #209;
C2) WMB processing
C3) Output, UTF-8 Encoding, 2 bytes, C3 91
The fact that sometimes we did see one char, Ñ, when looking at the output of A3) & C3) means that the viewing tool understands UTF-8 encoding
So when WMB must parse the incoming XML, because the input/output CCSID/Encoding is not the same or because the message is changed, then the HTML/XML escaping is converted to UTF-8 encoding. |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Sep 02, 2011 7:22 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Excellent analysis Herbert. Thanks for sharing it!  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Herbert |
Posted: Mon Sep 05, 2011 6:19 am Post subject: |
|
|
 Centurion
Joined: 05 Dec 2006 Posts: 146 Location: Leersum, The Netherlands
|
That I understand it now is one thing. Telling the external party and our test department that converthing the HTML encoding &# 209; to 2 bytes (C3 91) UTF-8 encoding is correct is something else ...
I tried to genereate a single byte encoded XML file, CCSID 819 and ISO-8859-1 as encoding in the XML declaration, in the hope WMB would use xml/html escaping when there are chars that are not in this target CCSID.
This is not happening, it does it best to do a correct translation to 1 byte. For example, if it looks like a U (ŨũŪūŬŭŮůŰű) then it is translated to a U, if not a good match is found then its translated to x'BF' (a upside down question mark)
Why does WMB not use HTML/XML encoding for values that are not in the target CCSID ?
Things like ">" are correctly escaped to ">" in the output, then why not escape chars that are not in the target CCSID to &#nnn; ? |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue Sep 06, 2011 11:54 pm Post subject: Re: Is the ccsid/encoding combinatie 1208/273 invalid ? |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Herbert wrote: |
That I understand it now is one thing. Telling the external party and our test department that converthing the HTML encoding &# 209; to 2 bytes (C3 91) UTF-8 encoding is correct is something else ... |
Perhaps referring them to the XML standard would help?
Herbert wrote: |
Why does WMB not use HTML/XML encoding for values that are not in the target CCSID ?
Things like ">" are correctly escaped to ">" in the output, then why not escape chars that are not in the target CCSID to &#nnn; ? |
The XML standard requires the use of predefined entities ('<', '&', etc.) to escape certain literal characters; it does not require the use of numeric character references.
The routines that perform character conversion are not specific to the XML parsers. They do not necessarily know whether the characters they are converting come from XML messages or not. |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Sep 07, 2011 2:16 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
I find the w3schools site very useful and easy to understand.
This page [http://www.w3schools.com/tags/ref_entities.asp has a
list of the HTML ISO-8859-1 Reserved Characters. These apply to XML as well.
The & , < and > ( &,<,> )
representations will quickly become
familiar to you is you are dealing with XML, HTML and any of the other *ML markup languages for any length of time. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|