MQSeries.net :: View topic - Character set 1200 doesn't work whereas 1202 works

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ API Support » Character set 1200 doesn't work whereas 1202 works

Goto page Previous 1, 2

Character set 1200 doesn't work whereas 1202 works

« View previous topic :: View next topic »

Author

Message

rekarm01

Posted: Sat Aug 27, 2016 6:39 pm Post subject: Re: Character set 1200 doesn't work whereas 1202 works

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:

The interpretation of CCSID 1200 in MQ is somewhat confusing in what I have observed.

There are some differences between how the IBM Character Data Representation Architecture (CDRA) specifies UTF-16 endianness, and how either IBM MQ or the WMB/IIB broker implements it.

The IBM CDRA defines the following ccsids for UTF-16: ("BOM" refers to byte order mark. The difference between the even/odd ccsids is not relevant here, but has to do with Unicode code points set aside for private use.)

ccsid=1200,1201: UTF-16BE, uses big-endian, no BOM
ccsid=1202,1203: UTF-16LE, uses little-endian, no BOM
ccsid=1204,1205: UTF-16, uses BOM if present, otherwise uses big-endian

For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:

conversion from source ccsid=1200: uses BOM if present, otherwise uses MQMD.Encoding
conversion to target ccsid=1200: uses MsgDesc.Encoding, retains BOM if present
conversion from/to ccsid=1201-1205: not supported

Windows/.NET users should note that Microsoft numbers its Unicode code pages differently from IBM ccsids:

windows-1200: UTF-16LE, uses little-endian, no BOM
windows-1201: UTF-16BE, uses big-endian, no BOM

For MQ .NET MQMessage ReadLine(), ReadString(), and WriteString() methods:

conversion from/to CharacterSet=1200: uses little-endian, no BOM
conversion from/to CharacterSet=1201-1205: not supported

The broker's implementation started out more like MQ, but is now more like the IBM CDRA. Early versions supported only ccsid=1200, and added support for the rest later. Parsers that handle intrinsically character data, (like XMLNSC, JSON), tend to recognize a BOM, while parsers that handle mixed data, (like MRM, DFDL), tend to not recognize a BOM, unless explicitly modeled. In the absence of a BOM, parsers for past versions of the broker had either used the QMgr.Encoding, Properties.Encoding, or assumed big-endian, for ccsid=1200.

For the currently supported versions of WMB/IIB, using the XMLNSC parser:

conversion from input ccsid=1200,1201: uses BOM if present, otherwise uses big-endian
conversion from input ccsid=1202,1203: uses BOM if present, otherwise uses little-endian
conversion from input ccsid=1204,1205: uses BOM if present, otherwise uses big-endian
conversion to output ccsid=1200: uses Properties.Encoding, adds BOM
conversion to output ccsid=1201: uses big-endian, no BOM
conversion to output ccsid=1202,1203: uses little-endian, no BOM
conversion to output ccsid=1204,1205: uses big-endian, adds BOM

If any of that is still confusing, then there's always UTF-8.

[Edit: Separated the "conversion from ..." and "conversion to ..." clauses, clarified MQMD/MsgDesc Encoding for MQGET, and added explicit "no BOM" clauses where BOM is not used.]

Last edited by rekarm01 on Sat Jun 03, 2017 8:13 pm; edited 4 times in total

tczielke

Posted: Sun Aug 28, 2016 5:01 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

Thanks for the detailed information there. That was helpful.

Quote:

For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:
â€¢ccsid=1200: source uses BOM if present, otherwise uses MQMD.Encoding; target uses MQMD.Encoding, retains BOM if present

â€¢ccsid=1201-1205: not supported

From the experimenting that I have done with MQ data conversion and a 1200 CCSID message with a BOM, I found that MQ would use the BOM over the CCSID at the target. For example, if I PUT a 1200 CCSID message that was little endian with a BOM that was little endian but also lied and said the MQMD.Encoding was big endian, MQ would still convert the data correctly to something like 819 at the target.

Quote:

If any of that is still confusing, then there's always UTF-8.

If applicable, I would recommend using UTF-8 (1208) over UTF-16 (1200). 1208 is more straight forward to work with, and does not have the endianness property to deal with like 1200.
_________________
Working with MQ since 2010.

rekarm01

Posted: Sun Aug 28, 2016 2:02 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:

Quote:

For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:

ccsid=1200: source uses BOM if present, otherwise uses MQMD.Encoding; target uses MQMD.Encoding, retains BOM if present

Maybe that should be "target uses MsgDesc.Encoding", to more clearly indicate the MsgDesc parameter on the MQGET call.

tczielke wrote:

I meant to describe the "conversion from ..." and "conversion to ..." as if it were two separate steps. The source clause above only describes conversion from ccsid=1200, and the target clause above only describes conversion to ccsid=1200. It doesn't describe conversion to ccsid=819. Let's see if I can make that more clear ...

Display posts from previous:

Goto page Previous 1, 2

Page 2 of 2

MQSeries.net Forum Index » IBM MQ API Support » Character set 1200 doesn't work whereas 1202 works

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP