MQSeries.net :: View topic - CCSID and Format issue

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » CCSID and Format issue

CCSID and Format issue

« View previous topic :: View next topic »

Author

Message

Bartez75

Posted: Thu Nov 05, 2009 8:44 am Post subject: CCSID and Format issue

Voyager

Joined: 26 Oct 2006
Posts: 80
Location: Poland, Wroclaw

Hello

I have a message on the queue (broker is stopped). It was send by some system on which I don't know except it is mainframe which is able to send xml message.
So, yes it is xml message.
This message has MQMD.Format="" (yes, binary) and MQMD.CCSID=850 (which is IBM PC DOS latin1).

How can I find out the actual code page of the data part.

There is this national character Ã© and hex value for it is E9 (checked in rfhutil).

So for sure it is not in 850 because in 850 it is Ãš. (found here:
http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/QB3AQ501/F.26?SHELF=&DT=19971201194621)

It could be in 819 (http://msdn.microsoft.com/en-us/goglobal/cc305167.aspx)

They claim it is UTF-8. Shouldn't it be double bytes then?

Other question is how can I convert this message to different CCSID when it has MQMD.Format=""?
Is it possible?
I tried with Convert on MQInputNode but this gives error from MQGet.

fjb_saper

Posted: Thu Nov 05, 2009 9:02 am Post subject: Re: CCSID and Format issue

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Bartez75 wrote:

If they sent the message content indeed in UTF-8 tell them they need to set CCSID 1208 on the MQMD of the message.

You will not be able to have the MQInput node parse correctly until this is fixed. You could alternatively receive the message as a BLOB and do an ESQL parse setting the CCSID to 1208... However my preferred resolution would be for the sender to fix the CCSID on the message (MQMD).
Have fun

_________________
MQ & Broker admin

kimbert

Posted: Thu Nov 05, 2009 12:28 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

They claim it is UTF-8. Shouldn't it be double bytes then?

http://www.joelonsoftware.com/articles/Unicode.html
and this
http://en.wikipedia.org/wiki/Unicode

Not sure why you couldn't find and read these articles yourself...I agree with the title of the first link.

smdavies99

Posted: Thu Nov 05, 2009 12:57 pm Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

I totally agree with kimbert on this.

This whole topic has been covered extensivley in this forum especially over the past month or so.

In my experience, many of the messages I've seen are the result of the sending application not understanding the type of data that is going to be sent. In your case, the MQMD is incorrectly setup so you know who to push this issue back onto.

UTF-8 is a single byte format. However, it can contain Double byte characters if they are escaped correctly(escaped might be the wrong word?)

Slightly off topic
There are a number of weighty tomes that cover this whole issue. They are gathering dust on my bookshelf. I got them when I worked on a Kazak Credit card system. However thay have proved really useful when trying to decipher random messages over the years.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

kimbert

Posted: Thu Nov 05, 2009 2:45 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

smdavies99 said:

Quote:

UTF-8 is a single byte format

Hmmm. I don't like to be picky, but that's a little misleading. The first para of the Wikipedia entry for UTF-8 says:

Quote:

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages,[1][2] and other places where characters are stored or streamed.

UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes), with the single octet encoding used only for the 128 US-ASCII characters. See the Description section below for details.

So it's only single-byte when your characters are US-ASCII characters.

rekarm01

Posted: Thu Nov 05, 2009 8:21 pm Post subject: Re: CCSID and Format issue

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Bartez75 wrote:

How can I find out the actual code page of the data part.

Bytes are not self-defining. To avoid corrupting data:

either the sender must provide the correct Format, CodedCharSetID, and Encoding, along with the bytes
or the receiver must guess the correct Format, CodedCharSetID, and Encoding

"Guessing" might also include an implied agreement between sender and receiver, or some sort of heuristic examination of the bytes themselves, but ultimately the receiver depends on the sender to encode the bytes as advertised, in order to decode them properly.

Bartez75 wrote:

There is this national character Ã© and hex value for it is E9 (checked in rfhutil).

If the ccsid is in doubt, or unknown, using a tool like rfhutil to compare hex bytes against known code pages is about the only approach left, even if only to rule out certain ccsids.

Bartez75 wrote:

They claim it is UTF-8. Shouldn't it be double bytes then?

X'E9' is probably not UTF-8, unless it's part of an Asian character (CJK Unified Han Ideograph) ... in which case, it would also be three bytes.

For UTF-8, 'Ã©' looks like X'C3 A9'.

Bartez75 wrote:

Other question is how can I convert this message to different CCSID when it has MQMD.Format=""? Is it possible?

It is possible with ESQL CAST function. It is not possible with MQInput Convert option.

rekarm01

Posted: Thu Nov 05, 2009 8:52 pm Post subject: Re: CCSID and Format issue

Grand Master

Joined: 25 Jun 2008
Posts: 1415

smdavies99 wrote:

UTF-8 is a single byte format. However, it can contain Double byte characters if they are escaped correctly (escaped might be the wrong word?)

To describe fixed-width character set encodings, IBM uses the terms "single-byte" (SBCS), "double-byte" (DBCS), "triple-byte" (TBCS), etc.

To describe variable-width character set encodings, IBM uses the terms "mixed-byte" for ccsids that use shift characters or escape sequences to switch between fixed-width code pages, and "multibyte" (MBCS) for ccsids, that use leading/trailing bytes to represent multibyte characters.

UTF-8 is a multibyte encoding, using from one to four bytes to represent a character.

Bartez75

Posted: Fri Nov 06, 2009 1:31 am Post subject:

Voyager

Joined: 26 Oct 2006
Posts: 80
Location: Poland, Wroclaw

I made one more test with this message (before I will try to explain the sender that they must be sure how they send this message and to provide correct CCSID).

In the broker flow I have added following lines:

Code:

SET OutputRoot.MQMD.Format = 'MQSTR';
SET OutputRoot.MQMD.CodedCharSetId = 1208;

I sent this message and the message on the output queue has following hex value for the character that supposed to be Ã©.

It is C3 9A which is according to this UTF-8 decoder http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder a letter Ãš.

Does this tell you anything more then what was said before here in this topic?

Bartez75

Posted: Fri Nov 06, 2009 8:02 am Post subject:

Voyager

Joined: 26 Oct 2006
Posts: 80
Location: Poland, Wroclaw

I think that the message (data) is in iso-latin1 CCSID=819.
I did a test by setting a BLOB on MQInput node, then I put in the code:

Code:

SET OutputRoot.Properties.CodedCharSetId=819

After that I added ResetContentDescription to xml format and when message was put on the output queue the Ã© was still E9. So at least for that letter it is correct (the hex value is correct).

Now I think, the sender application will have to investigate what they really want to send because it is not UTF-8.

rekarm01

Posted: Sat Nov 07, 2009 4:23 pm Post subject: Re: CCSID and Format issue

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Bartez75 wrote:

I think that the message (data) is in iso-latin1 CCSID=819.

... or possibly any one of a several dozen other ccsids based on ISO 8859 Latin-1 through Latin-10 character sets.

Only the sender knows for sure.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » CCSID and Format issue

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP