ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » CCSID and Format issue

Post new topic  Reply to topic
 CCSID and Format issue « View previous topic :: View next topic » 
Author Message
Bartez75
PostPosted: Thu Nov 05, 2009 8:44 am    Post subject: CCSID and Format issue Reply with quote

Voyager

Joined: 26 Oct 2006
Posts: 80
Location: Poland, Wroclaw

Hello

I have a message on the queue (broker is stopped). It was send by some system on which I don't know except it is mainframe which is able to send xml message.
So, yes it is xml message.
This message has MQMD.Format="" (yes, binary) and MQMD.CCSID=850 (which is IBM PC DOS latin1).

How can I find out the actual code page of the data part.

There is this national character é and hex value for it is E9 (checked in rfhutil).

So for sure it is not in 850 because in 850 it is Ú. (found here:
http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/QB3AQ501/F.26?SHELF=&DT=19971201194621)

It could be in 819 (http://msdn.microsoft.com/en-us/goglobal/cc305167.aspx)

They claim it is UTF-8. Shouldn't it be double bytes then?


Other question is how can I convert this message to different CCSID when it has MQMD.Format=""?
Is it possible?
I tried with Convert on MQInputNode but this gives error from MQGet.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Nov 05, 2009 9:02 am    Post subject: Re: CCSID and Format issue Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Bartez75 wrote:
Hello

I have a message on the queue (broker is stopped). It was send by some system on which I don't know except it is mainframe which is able to send xml message.
So, yes it is xml message.
This message has MQMD.Format="" (yes, binary) and MQMD.CCSID=850 (which is IBM PC DOS latin1).

How can I find out the actual code page of the data part.

There is this national character é and hex value for it is E9 (checked in rfhutil).

So for sure it is not in 850 because in 850 it is Ú. (found here:
http://publib.boulder.ibm.com/cgi-bin/bookmgr/BOOKS/QB3AQ501/F.26?SHELF=&DT=19971201194621)

It could be in 819 (http://msdn.microsoft.com/en-us/goglobal/cc305167.aspx)

They claim it is UTF-8. Shouldn't it be double bytes then?


Other question is how can I convert this message to different CCSID when it has MQMD.Format=""?
Is it possible?
I tried with Convert on MQInputNode but this gives error from MQGet.

If they sent the message content indeed in UTF-8 tell them they need to set CCSID 1208 on the MQMD of the message.

You will not be able to have the MQInput node parse correctly until this is fixed. You could alternatively receive the message as a BLOB and do an ESQL parse setting the CCSID to 1208... However my preferred resolution would be for the sender to fix the CCSID on the message (MQMD).
Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
kimbert
PostPosted: Thu Nov 05, 2009 12:28 pm    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
They claim it is UTF-8. Shouldn't it be double bytes then?

http://www.joelonsoftware.com/articles/Unicode.html
and this
http://en.wikipedia.org/wiki/Unicode

Not sure why you couldn't find and read these articles yourself...I agree with the title of the first link.
Back to top
View user's profile Send private message
smdavies99
PostPosted: Thu Nov 05, 2009 12:57 pm    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

I totally agree with kimbert on this.

This whole topic has been covered extensivley in this forum especially over the past month or so.

In my experience, many of the messages I've seen are the result of the sending application not understanding the type of data that is going to be sent. In your case, the MQMD is incorrectly setup so you know who to push this issue back onto.

UTF-8 is a single byte format. However, it can contain Double byte characters if they are escaped correctly(escaped might be the wrong word?)

Slightly off topic
There are a number of weighty tomes that cover this whole issue. They are gathering dust on my bookshelf. I got them when I worked on a Kazak Credit card system. However thay have proved really useful when trying to decipher random messages over the years.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
kimbert
PostPosted: Thu Nov 05, 2009 2:45 pm    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

smdavies99 said:
Quote:
UTF-8 is a single byte format
Hmmm. I don't like to be picky, but that's a little misleading. The first para of the Wikipedia entry for UTF-8 says:
Quote:
UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages,[1][2] and other places where characters are stored or streamed.

UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes), with the single octet encoding used only for the 128 US-ASCII characters. See the Description section below for details.
So it's only single-byte when your characters are US-ASCII characters.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Thu Nov 05, 2009 8:21 pm    Post subject: Re: CCSID and Format issue Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Bartez75 wrote:
How can I find out the actual code page of the data part.

Bytes are not self-defining. To avoid corrupting data:
  • either the sender must provide the correct Format, CodedCharSetID, and Encoding, along with the bytes
  • or the receiver must guess the correct Format, CodedCharSetID, and Encoding
"Guessing" might also include an implied agreement between sender and receiver, or some sort of heuristic examination of the bytes themselves, but ultimately the receiver depends on the sender to encode the bytes as advertised, in order to decode them properly.

Bartez75 wrote:
There is this national character é and hex value for it is E9 (checked in rfhutil).

If the ccsid is in doubt, or unknown, using a tool like rfhutil to compare hex bytes against known code pages is about the only approach left, even if only to rule out certain ccsids.

Bartez75 wrote:
They claim it is UTF-8. Shouldn't it be double bytes then?

X'E9' is probably not UTF-8, unless it's part of an Asian character (CJK Unified Han Ideograph) ... in which case, it would also be three bytes.

For UTF-8, 'é' looks like X'C3 A9'.

Bartez75 wrote:
Other question is how can I convert this message to different CCSID when it has MQMD.Format=""? Is it possible?

It is possible with ESQL CAST function. It is not possible with MQInput Convert option.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Thu Nov 05, 2009 8:52 pm    Post subject: Re: CCSID and Format issue Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

smdavies99 wrote:
UTF-8 is a single byte format. However, it can contain Double byte characters if they are escaped correctly (escaped might be the wrong word?)

To describe fixed-width character set encodings, IBM uses the terms "single-byte" (SBCS), "double-byte" (DBCS), "triple-byte" (TBCS), etc.

To describe variable-width character set encodings, IBM uses the terms "mixed-byte" for ccsids that use shift characters or escape sequences to switch between fixed-width code pages, and "multibyte" (MBCS) for ccsids, that use leading/trailing bytes to represent multibyte characters.

UTF-8 is a multibyte encoding, using from one to four bytes to represent a character.
Back to top
View user's profile Send private message
Bartez75
PostPosted: Fri Nov 06, 2009 1:31 am    Post subject: Reply with quote

Voyager

Joined: 26 Oct 2006
Posts: 80
Location: Poland, Wroclaw

I made one more test with this message (before I will try to explain the sender that they must be sure how they send this message and to provide correct CCSID).

In the broker flow I have added following lines:
Code:
SET OutputRoot.MQMD.Format = 'MQSTR';
SET OutputRoot.MQMD.CodedCharSetId = 1208;


I sent this message and the message on the output queue has following hex value for the character that supposed to be é.

It is C3 9A which is according to this UTF-8 decoder http://software.hixie.ch/utilities/cgi/unicode-decoder/utf8-decoder a letter Ú.

Does this tell you anything more then what was said before here in this topic?
Back to top
View user's profile Send private message
Bartez75
PostPosted: Fri Nov 06, 2009 8:02 am    Post subject: Reply with quote

Voyager

Joined: 26 Oct 2006
Posts: 80
Location: Poland, Wroclaw

I think that the message (data) is in iso-latin1 CCSID=819.
I did a test by setting a BLOB on MQInput node, then I put in the code:
Code:
SET OutputRoot.Properties.CodedCharSetId=819


After that I added ResetContentDescription to xml format and when message was put on the output queue the é was still E9. So at least for that letter it is correct (the hex value is correct).

Now I think, the sender application will have to investigate what they really want to send because it is not UTF-8.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Sat Nov 07, 2009 4:23 pm    Post subject: Re: CCSID and Format issue Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Bartez75 wrote:
I think that the message (data) is in iso-latin1 CCSID=819.

... or possibly any one of a several dozen other ccsids based on ISO 8859 Latin-1 through Latin-10 character sets.

Only the sender knows for sure.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » CCSID and Format issue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.