Author |
Message
|
Bartez75 |
Posted: Thu Nov 05, 2009 8:44 am Post subject: CCSID and Format issue |
|
|
 Voyager
Joined: 26 Oct 2006 Posts: 80 Location: Poland, Wroclaw
|
Hello
I have a message on the queue (broker is stopped). It was send by some system on which I don't know except it is mainframe which is able to send xml message.
So, yes it is xml message.
This message has MQMD.Format="" (yes, binary) and MQMD.CCSID=850 (which is IBM PC DOS latin1).
How can I find out the actual code page of the data part.
There is this national character é and hex value for it is E9 (checked in rfhutil).
So for sure it is not in 850 because in 850 it is Ú. (found here:
http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/QB3AQ501/F.26?SHELF=&DT=19971201194621)
It could be in 819 (http://msdn.microsoft.com/en-us/goglobal/cc305167.aspx)
They claim it is UTF-8. Shouldn't it be double bytes then?
Other question is how can I convert this message to different CCSID when it has MQMD.Format=""?
Is it possible?
I tried with Convert on MQInputNode but this gives error from MQGet. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Nov 05, 2009 9:02 am Post subject: Re: CCSID and Format issue |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Bartez75 wrote: |
Hello
I have a message on the queue (broker is stopped). It was send by some system on which I don't know except it is mainframe which is able to send xml message.
So, yes it is xml message.
This message has MQMD.Format="" (yes, binary) and MQMD.CCSID=850 (which is IBM PC DOS latin1).
How can I find out the actual code page of the data part.
There is this national character é and hex value for it is E9 (checked in rfhutil).
So for sure it is not in 850 because in 850 it is Ú. (found here:
http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/QB3AQ501/F.26?SHELF=&DT=19971201194621)
It could be in 819 (http://msdn.microsoft.com/en-us/goglobal/cc305167.aspx)
They claim it is UTF-8. Shouldn't it be double bytes then?
Other question is how can I convert this message to different CCSID when it has MQMD.Format=""?
Is it possible?
I tried with Convert on MQInputNode but this gives error from MQGet. |
If they sent the message content indeed in UTF-8 tell them they need to set CCSID 1208 on the MQMD of the message.
You will not be able to have the MQInput node parse correctly until this is fixed. You could alternatively receive the message as a BLOB and do an ESQL parse setting the CCSID to 1208... However my preferred resolution would be for the sender to fix the CCSID on the message (MQMD).
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 05, 2009 12:28 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Nov 05, 2009 12:57 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
I totally agree with kimbert on this.
This whole topic has been covered extensivley in this forum especially over the past month or so.
In my experience, many of the messages I've seen are the result of the sending application not understanding the type of data that is going to be sent. In your case, the MQMD is incorrectly setup so you know who to push this issue back onto.
UTF-8 is a single byte format. However, it can contain Double byte characters if they are escaped correctly(escaped might be the wrong word?)
Slightly off topic
There are a number of weighty tomes that cover this whole issue. They are gathering dust on my bookshelf. I got them when I worked on a Kazak Credit card system. However thay have proved really useful when trying to decipher random messages over the years. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 05, 2009 2:45 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
smdavies99 said:
Quote: |
UTF-8 is a single byte format |
Hmmm. I don't like to be picky, but that's a little misleading. The first para of the Wikipedia entry for UTF-8 says:
Quote: |
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages,[1][2] and other places where characters are stored or streamed.
UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes), with the single octet encoding used only for the 128 US-ASCII characters. See the Description section below for details. |
So it's only single-byte when your characters are US-ASCII characters. |
|
Back to top |
|
 |
rekarm01 |
Posted: Thu Nov 05, 2009 8:21 pm Post subject: Re: CCSID and Format issue |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Bartez75 wrote: |
How can I find out the actual code page of the data part. |
Bytes are not self-defining. To avoid corrupting data:- either the sender must provide the correct Format, CodedCharSetID, and Encoding, along with the bytes
- or the receiver must guess the correct Format, CodedCharSetID, and Encoding
"Guessing" might also include an implied agreement between sender and receiver, or some sort of heuristic examination of the bytes themselves, but ultimately the receiver depends on the sender to encode the bytes as advertised, in order to decode them properly.
Bartez75 wrote: |
There is this national character é and hex value for it is E9 (checked in rfhutil). |
If the ccsid is in doubt, or unknown, using a tool like rfhutil to compare hex bytes against known code pages is about the only approach left, even if only to rule out certain ccsids.
Bartez75 wrote: |
They claim it is UTF-8. Shouldn't it be double bytes then? |
X'E9' is probably not UTF-8, unless it's part of an Asian character (CJK Unified Han Ideograph) ... in which case, it would also be three bytes.
For UTF-8, 'é' looks like X'C3 A9'.
Bartez75 wrote: |
Other question is how can I convert this message to different CCSID when it has MQMD.Format=""? Is it possible? |
It is possible with ESQL CAST function. It is not possible with MQInput Convert option. |
|
Back to top |
|
 |
rekarm01 |
Posted: Thu Nov 05, 2009 8:52 pm Post subject: Re: CCSID and Format issue |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
smdavies99 wrote: |
UTF-8 is a single byte format. However, it can contain Double byte characters if they are escaped correctly (escaped might be the wrong word?) |
To describe fixed-width character set encodings, IBM uses the terms "single-byte" (SBCS), "double-byte" (DBCS), "triple-byte" (TBCS), etc.
To describe variable-width character set encodings, IBM uses the terms "mixed-byte" for ccsids that use shift characters or escape sequences to switch between fixed-width code pages, and "multibyte" (MBCS) for ccsids, that use leading/trailing bytes to represent multibyte characters.
UTF-8 is a multibyte encoding, using from one to four bytes to represent a character. |
|
Back to top |
|
 |
Bartez75 |
Posted: Fri Nov 06, 2009 1:31 am Post subject: |
|
|
 Voyager
Joined: 26 Oct 2006 Posts: 80 Location: Poland, Wroclaw
|
I made one more test with this message (before I will try to explain the sender that they must be sure how they send this message and to provide correct CCSID).
In the broker flow I have added following lines:
Code: |
SET OutputRoot.MQMD.Format = 'MQSTR';
SET OutputRoot.MQMD.CodedCharSetId = 1208; |
I sent this message and the message on the output queue has following hex value for the character that supposed to be é.
It is C3 9A which is according to this UTF-8 decoder http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder a letter Ú.
Does this tell you anything more then what was said before here in this topic? |
|
Back to top |
|
 |
Bartez75 |
Posted: Fri Nov 06, 2009 8:02 am Post subject: |
|
|
 Voyager
Joined: 26 Oct 2006 Posts: 80 Location: Poland, Wroclaw
|
I think that the message (data) is in iso-latin1 CCSID=819.
I did a test by setting a BLOB on MQInput node, then I put in the code:
Code: |
SET OutputRoot.Properties.CodedCharSetId=819 |
After that I added ResetContentDescription to xml format and when message was put on the output queue the é was still E9. So at least for that letter it is correct (the hex value is correct).
Now I think, the sender application will have to investigate what they really want to send because it is not UTF-8. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sat Nov 07, 2009 4:23 pm Post subject: Re: CCSID and Format issue |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Bartez75 wrote: |
I think that the message (data) is in iso-latin1 CCSID=819. |
... or possibly any one of a several dozen other ccsids based on ISO 8859 Latin-1 through Latin-10 character sets.
Only the sender knows for sure. |
|
Back to top |
|
 |
|