ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ API Support » Character set 1200 doesn't work whereas 1202 works

Post new topic  Reply to topic Goto page Previous  1, 2
 Character set 1200 doesn't work whereas 1202 works « View previous topic :: View next topic » 
Author Message
rekarm01
PostPosted: Sat Aug 27, 2016 6:39 pm    Post subject: Re: Character set 1200 doesn't work whereas 1202 works Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:
The interpretation of CCSID 1200 in MQ is somewhat confusing in what I have observed.

There are some differences between how the IBM Character Data Representation Architecture (CDRA) specifies UTF-16 endianness, and how either IBM MQ or the WMB/IIB broker implements it.

The IBM CDRA defines the following ccsids for UTF-16: ("BOM" refers to byte order mark. The difference between the even/odd ccsids is not relevant here, but has to do with Unicode code points set aside for private use.)
  • ccsid=1200,1201: UTF-16BE, uses big-endian, no BOM
  • ccsid=1202,1203: UTF-16LE, uses little-endian, no BOM
  • ccsid=1204,1205: UTF-16, uses BOM if present, otherwise uses big-endian

For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:
  • conversion from source ccsid=1200: uses BOM if present, otherwise uses MQMD.Encoding
  • conversion to target ccsid=1200: uses MsgDesc.Encoding, retains BOM if present
  • conversion from/to ccsid=1201-1205: not supported

Windows/.NET users should note that Microsoft numbers its Unicode code pages differently from IBM ccsids:
  • windows-1200: UTF-16LE, uses little-endian, no BOM
  • windows-1201: UTF-16BE, uses big-endian, no BOM

For MQ .NET MQMessage ReadLine(), ReadString(), and WriteString() methods:
  • conversion from/to CharacterSet=1200: uses little-endian, no BOM
  • conversion from/to CharacterSet=1201-1205: not supported

The broker's implementation started out more like MQ, but is now more like the IBM CDRA. Early versions supported only ccsid=1200, and added support for the rest later. Parsers that handle intrinsically character data, (like XMLNSC, JSON), tend to recognize a BOM, while parsers that handle mixed data, (like MRM, DFDL), tend to not recognize a BOM, unless explicitly modeled. In the absence of a BOM, parsers for past versions of the broker had either used the QMgr.Encoding, Properties.Encoding, or assumed big-endian, for ccsid=1200.

For the currently supported versions of WMB/IIB, using the XMLNSC parser:
  • conversion from input ccsid=1200,1201: uses BOM if present, otherwise uses big-endian
  • conversion from input ccsid=1202,1203: uses BOM if present, otherwise uses little-endian
  • conversion from input ccsid=1204,1205: uses BOM if present, otherwise uses big-endian

  • conversion to output ccsid=1200: uses Properties.Encoding, adds BOM
  • conversion to output ccsid=1201: uses big-endian, no BOM
  • conversion to output ccsid=1202,1203: uses little-endian, no BOM
  • conversion to output ccsid=1204,1205: uses big-endian, adds BOM

If any of that is still confusing, then there's always UTF-8.


[Edit: Separated the "conversion from ..." and "conversion to ..." clauses, clarified MQMD/MsgDesc Encoding for MQGET, and added explicit "no BOM" clauses where BOM is not used.]


Last edited by rekarm01 on Sat Jun 03, 2017 8:13 pm; edited 4 times in total
Back to top
View user's profile Send private message
tczielke
PostPosted: Sun Aug 28, 2016 5:01 am    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

Thanks for the detailed information there. That was helpful.

Quote:

For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:
•ccsid=1200: source uses BOM if present, otherwise uses MQMD.Encoding; target uses MQMD.Encoding, retains BOM if present

•ccsid=1201-1205: not supported


From the experimenting that I have done with MQ data conversion and a 1200 CCSID message with a BOM, I found that MQ would use the BOM over the CCSID at the target. For example, if I PUT a 1200 CCSID message that was little endian with a BOM that was little endian but also lied and said the MQMD.Encoding was big endian, MQ would still convert the data correctly to something like 819 at the target.

Quote:

If any of that is still confusing, then there's always UTF-8.


If applicable, I would recommend using UTF-8 (1208) over UTF-16 (1200). 1208 is more straight forward to work with, and does not have the endianness property to deal with like 1200.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Sun Aug 28, 2016 2:02 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:
Quote:
For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:
  • ccsid=1200: source uses BOM if present, otherwise uses MQMD.Encoding; target uses MQMD.Encoding, retains BOM if present

Maybe that should be "target uses MsgDesc.Encoding", to more clearly indicate the MsgDesc parameter on the MQGET call.

tczielke wrote:
From the experimenting that I have done with MQ data conversion and a 1200 CCSID message with a BOM, I found that MQ would use the BOM over the CCSID at the target. For example, if I PUT a 1200 CCSID message that was little endian with a BOM that was little endian but also lied and said the MQMD.Encoding was big endian, MQ would still convert the data correctly to something like 819 at the target.

I meant to describe the "conversion from ..." and "conversion to ..." as if it were two separate steps. The source clause above only describes conversion from ccsid=1200, and the target clause above only describes conversion to ccsid=1200. It doesn't describe conversion to ccsid=819. Let's see if I can make that more clear ...
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2 Page 2 of 2

MQSeries.net Forum Index » IBM MQ API Support » Character set 1200 doesn't work whereas 1202 works
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.