|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
Character set 1200 doesn't work whereas 1202 works |
« View previous topic :: View next topic » |
Author |
Message
|
rekarm01 |
Posted: Sat Aug 27, 2016 6:39 pm Post subject: Re: Character set 1200 doesn't work whereas 1202 works |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
tczielke wrote: |
The interpretation of CCSID 1200 in MQ is somewhat confusing in what I have observed. |
There are some differences between how the IBM Character Data Representation Architecture (CDRA) specifies UTF-16 endianness, and how either IBM MQ or the WMB/IIB broker implements it.
The IBM CDRA defines the following ccsids for UTF-16: ("BOM" refers to byte order mark. The difference between the even/odd ccsids is not relevant here, but has to do with Unicode code points set aside for private use.)
- ccsid=1200,1201: UTF-16BE, uses big-endian, no BOM
- ccsid=1202,1203: UTF-16LE, uses little-endian, no BOM
- ccsid=1204,1205: UTF-16, uses BOM if present, otherwise uses big-endian
For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:
- conversion from source ccsid=1200: uses BOM if present, otherwise uses MQMD.Encoding
- conversion to target ccsid=1200: uses MsgDesc.Encoding, retains BOM if present
- conversion from/to ccsid=1201-1205: not supported
Windows/.NET users should note that Microsoft numbers its Unicode code pages differently from IBM ccsids:
- windows-1200: UTF-16LE, uses little-endian, no BOM
- windows-1201: UTF-16BE, uses big-endian, no BOM
For MQ .NET MQMessage ReadLine(), ReadString(), and WriteString() methods:
- conversion from/to CharacterSet=1200: uses little-endian, no BOM
- conversion from/to CharacterSet=1201-1205: not supported
The broker's implementation started out more like MQ, but is now more like the IBM CDRA. Early versions supported only ccsid=1200, and added support for the rest later. Parsers that handle intrinsically character data, (like XMLNSC, JSON), tend to recognize a BOM, while parsers that handle mixed data, (like MRM, DFDL), tend to not recognize a BOM, unless explicitly modeled. In the absence of a BOM, parsers for past versions of the broker had either used the QMgr.Encoding, Properties.Encoding, or assumed big-endian, for ccsid=1200.
For the currently supported versions of WMB/IIB, using the XMLNSC parser:
- conversion from input ccsid=1200,1201: uses BOM if present, otherwise uses big-endian
- conversion from input ccsid=1202,1203: uses BOM if present, otherwise uses little-endian
- conversion from input ccsid=1204,1205: uses BOM if present, otherwise uses big-endian
- conversion to output ccsid=1200: uses Properties.Encoding, adds BOM
- conversion to output ccsid=1201: uses big-endian, no BOM
- conversion to output ccsid=1202,1203: uses little-endian, no BOM
- conversion to output ccsid=1204,1205: uses big-endian, adds BOM
If any of that is still confusing, then there's always UTF-8.
[Edit: Separated the "conversion from ..." and "conversion to ..." clauses, clarified MQMD/MsgDesc Encoding for MQGET, and added explicit "no BOM" clauses where BOM is not used.]
Last edited by rekarm01 on Sat Jun 03, 2017 8:13 pm; edited 4 times in total |
|
Back to top |
|
 |
tczielke |
Posted: Sun Aug 28, 2016 5:01 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
Thanks for the detailed information there. That was helpful.
Quote: |
For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:
•ccsid=1200: source uses BOM if present, otherwise uses MQMD.Encoding; target uses MQMD.Encoding, retains BOM if present
•ccsid=1201-1205: not supported
|
From the experimenting that I have done with MQ data conversion and a 1200 CCSID message with a BOM, I found that MQ would use the BOM over the CCSID at the target. For example, if I PUT a 1200 CCSID message that was little endian with a BOM that was little endian but also lied and said the MQMD.Encoding was big endian, MQ would still convert the data correctly to something like 819 at the target.
Quote: |
If any of that is still confusing, then there's always UTF-8.
|
If applicable, I would recommend using UTF-8 (1208) over UTF-16 (1200). 1208 is more straight forward to work with, and does not have the endianness property to deal with like 1200. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Aug 28, 2016 2:02 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
tczielke wrote: |
Quote: |
For MQ MQGET call with MQGMO_CONVERT/MQFMT_STRING:
- ccsid=1200: source uses BOM if present, otherwise uses MQMD.Encoding; target uses MQMD.Encoding, retains BOM if present
|
|
Maybe that should be "target uses MsgDesc.Encoding", to more clearly indicate the MsgDesc parameter on the MQGET call.
tczielke wrote: |
From the experimenting that I have done with MQ data conversion and a 1200 CCSID message with a BOM, I found that MQ would use the BOM over the CCSID at the target. For example, if I PUT a 1200 CCSID message that was little endian with a BOM that was little endian but also lied and said the MQMD.Encoding was big endian, MQ would still convert the data correctly to something like 819 at the target. |
I meant to describe the "conversion from ..." and "conversion to ..." as if it were two separate steps. The source clause above only describes conversion from ccsid=1200, and the target clause above only describes conversion to ccsid=1200. It doesn't describe conversion to ccsid=819. Let's see if I can make that more clear ... |
|
Back to top |
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|