MQSeries.net :: View topic - Is the ccsid/encoding combinatie 1208/273 invalid ?

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Is the ccsid/encoding combinatie 1208/273 invalid ?

Is the ccsid/encoding combinatie 1208/273 invalid ?

« View previous topic :: View next topic »

Author

Message

Herbert

Posted: Thu Sep 01, 2011 5:41 am Post subject: Is the ccsid/encoding combinatie 1208/273 invalid ?

Centurion

Joined: 05 Dec 2006
Posts: 146
Location: Leersum, The Netherlands

Hi,

A WMB flow of ours is getting XML messages with ccsid/encoding of 1208/273, this flow puts a XML message on the queue with ccsid/encodig of 1208/546 .

The problem is that this WMB flow changes the value of some xml elements, UTF encoding ( &#nnn; ) is replaced by the 1 byte actual value ( Ã‘ ).

The given ccsid/encoding values is a strange combination of Unicode 1208/546 and AIX 819/273. If the flow gets the same message in in the 'normal' unicode combination 1208/546 then it works fine. And that is also my request to the delivering party.

However I don't understand this. Is encoding relevant for XML messages with MQSTR as format ?

I know its very relevant when working whith decimal values in the MRM domain and you have cobol copy books as input/output, things like big endian vs little endian and so. In a UTF-8 XML message in MQSTR format with CCSID 1208 there are no decimal fields, so the encoding atribute is not relevant. What Im I missing?

Kind Regards, Herbert

fjb_saper

Posted: Thu Sep 01, 2011 7:27 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

you're probably missing that you go from a multibyte CCSID 1208 to a single byte CCISD 819 where not all characters can be translated. In that case most conversion programs will attempt to replace the unknown char with a substitution char (hex 1A?) and you will end up with a malformed XML...

_________________
MQ & Broker admin

mqjeff

Posted: Thu Sep 01, 2011 10:23 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

fjb_saper wrote:

you're probably missing that you go from a multibyte CCSID 1208 to a single byte CCISD 819

rekarm01

Posted: Thu Sep 01, 2011 11:05 am Post subject: Re: Is the ccsid/encoding combinatie 1208/273 invalid ?

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Herbert wrote:

However I don't understand this. Is encoding relevant for XML messages with MQSTR as format ?

I know its very relevant when working with decimal values in the MRM domain and you have cobol copy books as input/output, things like big endian vs little endian and so. In a UTF-8 XML message in MQSTR format with CCSID 1208 there are no decimal fields, so the encoding atribute is not relevant.

That's correct. The MQ Header Encoding field (not to be confused with the XML encoding declaration) is not relevant in this case.

Herbert wrote:

The problem is that this WMB flow changes the value of some xml elements, UTF encoding ( &#nnn; ) is replaced by the 1 byte actual value ( Ã‘ ).

The XML standard requires XML processors to replace character references with literal characters, when parsing. It does not require XML processors to put them back when writing.

Herbert wrote:

If the flow gets the same message in in the 'normal' unicode combination 1208/546 then it works fine.

If the message flow does not modify the message, then it can pass the original, unparsed input message directly to the output, without invoking the parser to re-write it.

Herbert

Posted: Fri Sep 02, 2011 1:58 am Post subject:

Centurion

Joined: 05 Dec 2006
Posts: 146
Location: Leersum, The Netherlands

fjb_saper wrote:

yes and no

Yes, Multibyte to 1 byte translation, thas was indeed what was happening.

No, This was not our intention, and the output is still CCSID 1208, the message did go in WMB from 1208/273 to 1208/546.

My simple answer was: The sending MQMD, 1208/273, is wrong. Or use unicode 1208/546 or use the platform native (AIX 819/273 is this case), not this strange combination. And bec the content of the message is UTF-8, I adviced to use 1208/546.

And they acted on this statement from me, the JMS setup in the WAS server that was the originator of those messages is now changed, the messages are now put on the queue as 1208/546 and all works now.

However I think that my statement is wrong

It should not matter, and rekarm01 confirms this:

rekarm01 wrote:

That's correct. The MQ Header Encoding field (not to be confused with the XML encoding declaration) is not relevant in this case.

So, still some RTFM to do ...

Herbert

Posted: Fri Sep 02, 2011 5:59 am Post subject:

Centurion

Joined: 05 Dec 2006
Posts: 146
Location: Leersum, The Netherlands

ok, I think I understand it

The char Ã‘
When given as HTML/XML escaping it is 6 chars: & #209;
When given as UTF-8 encoding it is 2 bytes: C3 91

Technically nothing goes wrong, WMB does work

This is what is happening.

When we go from 1208/273 to 1208/546 (both with a changed and unchanged message in WMB)
A1) Input, XML/HMTL escaping, 6 chars, & #209;
A2) WMB processing
A3) Output, UTF-8 Encoding, 2 bytes, C3 91

When the message is not modified and both input/output are 1208/546
B1) Input, XML/HMTL escaping, 6 chars, & #209;
B2) WMB processing
B3) Output, XML/HMTL escaping, 6 chars, & #209;

When the message is modified and both input/output are 1208/546
C1) Input, XML/HMTL escaping, 6 chars, & #209;
C2) WMB processing
C3) Output, UTF-8 Encoding, 2 bytes, C3 91

The fact that sometimes we did see one char, Ã‘, when looking at the output of A3) & C3) means that the viewing tool understands UTF-8 encoding

So when WMB must parse the incoming XML, because the input/output CCSID/Encoding is not the same or because the message is changed, then the HTML/XML escaping is converted to UTF-8 encoding.

fjb_saper

Posted: Fri Sep 02, 2011 7:22 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Excellent analysis Herbert. Thanks for sharing it!

_________________
MQ & Broker admin

Herbert

Posted: Mon Sep 05, 2011 6:19 am Post subject:

Centurion

Joined: 05 Dec 2006
Posts: 146
Location: Leersum, The Netherlands

That I understand it now is one thing. Telling the external party and our test department that converthing the HTML encoding &# 209; to 2 bytes (C3 91) UTF-8 encoding is correct is something else ...

I tried to genereate a single byte encoded XML file, CCSID 819 and ISO-8859-1 as encoding in the XML declaration, in the hope WMB would use xml/html escaping when there are chars that are not in this target CCSID.

This is not happening, it does it best to do a correct translation to 1 byte. For example, if it looks like a U (ŨũŪūŬŭŮůŰű) then it is translated to a U, if not a good match is found then its translated to x'BF' (a upside down question mark)

Why does WMB not use HTML/XML encoding for values that are not in the target CCSID ?

Things like ">" are correctly escaped to ">" in the output, then why not escape chars that are not in the target CCSID to &#nnn; ?

rekarm01

Posted: Tue Sep 06, 2011 11:54 pm Post subject: Re: Is the ccsid/encoding combinatie 1208/273 invalid ?

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Herbert wrote:

That I understand it now is one thing. Telling the external party and our test department that converthing the HTML encoding &# 209; to 2 bytes (C3 91) UTF-8 encoding is correct is something else ...

Perhaps referring them to the XML standard would help?

Herbert wrote:

Why does WMB not use HTML/XML encoding for values that are not in the target CCSID ?

Things like ">" are correctly escaped to ">" in the output, then why not escape chars that are not in the target CCSID to &#nnn; ?

The XML standard requires the use of predefined entities ('<', '&', etc.) to escape certain literal characters; it does not require the use of numeric character references.

The routines that perform character conversion are not specific to the XML parsers. They do not necessarily know whether the characters they are converting come from XML messages or not.

smdavies99

Posted: Wed Sep 07, 2011 2:16 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

I find the w3schools site very useful and easy to understand.

This page [http://www.w3schools.com/tags/ref_entities.asp has a
list of the HTML ISO-8859-1 Reserved Characters. These apply to XML as well.

The & , < and > ( &,<,> )
representations will quickly become
familiar to you is you are dealing with XML, HTML and any of the other *ML markup languages for any length of time.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Is the ccsid/encoding combinatie 1208/273 invalid ?

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP