MQSeries.net :: View topic - Characters modified between two Queue Managers

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » Characters modified between two Queue Managers

Goto page Previous 1, 2

Characters modified between two Queue Managers

« View previous topic :: View next topic »

Author

Message

tczielke

Posted: Tue Nov 28, 2017 5:49 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

fjb_saper wrote:

tczielke wrote:

MQ seems to have a much looser interpretation of the IBM CCSID (e.g. MQ CCSID 1200 which collapses several IBM CCSIDs into it). You almost need two different terms here, IBM CCSID and MQ CCSID. It makes me question how well you can go to the IBM CCSID definition and then expect the MQ CCSID equivalent to competely follow it. You can't for at least 1200.

When talking about CCSID 1200 here are you talking loosely (including CCSID 1201 and 1202) or strictly (CCSID 1200 only)?

Of the IBM CCSIDs 1200 (UTF-16 BE with IBM PUA), 1201 (UTF-16 BE), 1202 (UTF-16 LE with IBM PUA), 1203 (UTF-16 LE), 1204 (UTF-16 with IBM PUA), 1205 (UTF-16), MQ only accepts 1200. MQ also violates some of the rules of the IBM 1200 CCSID, by also interpreting the md.CCSID=1200 as little endian based on a BOM in the message data or a md.Encoding setting of little endian.

I also found out in a PMR that IBM MQ does not use the same tools for data conversion. For example, iconv is used on Linux and another underlying tool was used on Solaris. Does iconv completely follow the IBM CCSID definitions?

For the common practice use of an MQ programmer or administrator, you can think of the MQ md.CCSID as being similar to the IBM CCSID, but it is not completely accurate to define the MQ md.CCSID against the IBM CCSID definition, in my opinion.
_________________
Working with MQ since 2010.

gbaddeley

Posted: Tue Nov 28, 2017 2:57 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

tczielke wrote:

I also found out in a PMR that IBM MQ does not use the same tools for data conversion. For example, iconv is used on Linux and another underlying tool was used on Solaris. Does iconv completely follow the IBM CCSID definitions?

On AIX, IBM MQ uses the AIX OS conversion tables, so I would imagine these follow the IBM definitions of CCSIDs.
_________________
Glenn

bruce2359

Posted: Tue Nov 28, 2017 4:28 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

gbaddeley wrote:

tczielke wrote:

On AIX, IBM MQ uses the AIX OS conversion tables, so I would imagine these follow the IBM definitions of CCSIDs.

IBM makes no guarantee as to the software "tool" used for conversion; only that the conversion performed will adhere to published documentation.

I'd imagine that the tool is likely to be specific to the platform hardware, mq version, release and modification level.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

rekarm01

Posted: Fri Dec 01, 2017 7:32 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:

I would say that the MQ CCSID is not the same thing as the IBM CCSID ... You almost need two different terms here

Except for a handful of CCSIDs here and there, the MQ Knowledge Center doesn't really describe or define what it is that CCSIDs actually identify, implying that something else must define them, somewhere else. And while it doesn't directly refer to the IBM CDRA, it does refer to a separate Data Conversion doc, which in turn refers to the CDRA. Aside from the previously mentioned UTF-16 CCSIDs, there's not much else that MQ implements differently, so suggesting separate terminologies for the two seems a bit much. Just documenting the differences more clearly ought to be enough.

MQ probably has historical reasons for how it handles UTF-16. MQ supports data conversion for numeric data, so it has a separate "Encoding" parameter, independent of the CCSID, that it can use to manage endianness; this allows it to consolidate the different UTF-16 character encoding schemes into a single CCSID, (for better or worse). The IBM CDRA doesn't have that, so it uses separate CCSIDs to describe the different UTF-16 encoding schemes.

tczielke wrote:

Does iconv completely follow the IBM CCSID definitions?

No, not directly. iconv refers to coded character sets by "codeset" names, such as "ISO8859-1", or "UTF-16BE", or "IBM-037". MQ needs to match up supported CCSIDs with codeset names in order to use iconv.

tczielke

Posted: Sat Dec 02, 2017 5:40 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

rekarm01 wrote:

tczielke wrote:

I would say that the MQ CCSID is not the same thing as the IBM CCSID ... You almost need two different terms here

tczielke wrote:

Does iconv completely follow the IBM CCSID definitions?

Based on my memory of an IBM PMR, MQ on Linux is using iconv for the data conversion. This means that when you specify md.CCSID=1208 on Linux, the data is being understood by MQ as UTF-8 as based on the UTF-8 definitions and data conversion rules of iconv. That is not the same thing as being based on the CDRA CCSID of 1208. So to me, the md.CCSID is not the same thing as the CRDA CCSID, at least on some MQ platforms.

It would be interesting to have IBM give a definition of the md.CCSID. I could not find one in the MQ manual.
_________________
Working with MQ since 2010.

rekarm01

Posted: Sun Dec 03, 2017 2:30 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:

MQ on Linux is using iconv for the data conversion. This means that when you specify md.CCSID=1208 on Linux, the data is being understood by MQ as UTF-8 as based on the UTF-8 definitions and data conversion rules of iconv.

Don't get sidetracked with iconv; it is just a means to an end. It doesn't really have its own built-in definitions or conversion rules. It provides a set of APIs, configurable codeset tables, conversion tables, etc., and IBM can add its own tables as needed, to suit whatever requirements MQ has. Neither iconv, nor MQ, (nor the CDRA), define the underlying codesets. They defer to external standards for that, such as from ISO, Unicode, OEMs, or other vendors. It is up to the individual implementations to document to what extent they conform (or don't conform) to any external standards.

tczielke wrote:

That is not the same thing as being based on the CDRA CCSID of 1208. So to me, the md.CCSID is not the same thing as the [CDRA] CCSID, at least on some MQ platforms.

Both MQ and the CDRA agree that CCSID=1208 means UTF-8, as defined in the Unicode standard. However, MQ v8 and earlier limited its support for UTF-8 "to those Unicode characters that can be encoded in UCS-2", so that was different. But at least it was documented.

tczielke wrote:

It would be interesting to have IBM give a definition of the md.CCSID. I could not find one in the MQ manual.

More detailed documentation here could certainly help to clear some things up.

gbaddeley

Posted: Sun Dec 03, 2017 3:46 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

Quote:

tczielke wrote:

It would be interesting to have IBM give a definition of the md.CCSID. I could not find one in the MQ manual.

More detailed documentation here could certainly help to clear some things up.

https://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html

If apps are going to rely on MQ to perform data conversion of extended characters to/from UTF-8 pages, ISO8859 pages, EBCDIC pages, etc, it is in their best interest to validate the conversion in app system testing, or do the conversion themselves.

I have seen an increasing trend of using XML in MQ message data, where extended characters use &nnn; encoding, which conveniently avoids MQ conversion issues.
_________________
Glenn

tczielke

Posted: Thu Dec 07, 2017 9:18 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

tczielke wrote:

That is not the same thing as being based on the CDRA CCSID of 1208. So to me, the md.CCSID is not the same thing as the [CDRA] CCSID, at least on some MQ platforms.

rekarm01 wrote:

Documented, but I am pretty sure, incorrect. Here is how it is defined in the /var/mqm/conv/table/ccsid.tbl at v8:

Code:

/var/mqm/conv/table/ccid.tbl

# CCSID Base CodePage CodePage Type Enc ACRI Codeset
# CCSID DBCS SBCS name
# ----- ----- -------- -------- ---- ---- ----- -------
1200 13488 0 13488 2 4 0 UCS-2
1208 13488 0 13488 3 5 0 UTF-8

Only 1200 was being mapped to UCS-2. 1208 was supporting the full UTF-8 (1 to 4 byte encodings) at v8. At least that is what I remember when I was working on the MQTC session for data conversion.

It's a low level bordering on pedantic point, but the md.CCSID != CDRA.CCSID. For example the CDRA.CCSID=1208 is UTF-8 with IBM PUA, and CDRA.CCSID=1209 is UTF-8 without the IBM PUA. So does that mean when I say md.CCSID=1208, it is UTF-8 with the IBM PUA data conversion fully supported? Probably not. If I had to guess, a CCSID of 1209 is probably more accurate to the CDRA for IBM MQ's use of UTF-8.

Most (if not all, except for myself because I am pedantic) MQ admins/programmers do not have to get down to this level of detail for the CCSID. However, if I am looking into an MQ data conversion that involves a certain CCSID, I am not going to go the CDRA and expect IBM MQ to completely follow that CDRA.CCSID definition, because I have already found several exceptions where they don't.
_________________
Working with MQ since 2010.

rekarm01

Posted: Tue Dec 12, 2017 8:34 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:

Here is how it is defined in the /var/mqm/conv/table/ccsid.tbl at v8:

Code:

That looks more like an MQ v9 "ccsid_part2.tbl" file. Does the MQ v8 "ccsid.tbl" file for Linux use the same syntax? Isn't this file for adding new ccsids, rather than listing ccsids that are already supported? The MQ documentation doesn't really explain all the columns, but this file seems to be trying to modify ccsid=1200 (UTF-16) as an alias for ccsid=13488 (Unicode 2.0, UTF-16) for some reason, and then mapping that to a "UCS-2" codeset name. It's also trying to do something similar for ccsid=1208, except that "5304" (Unicode 2.0, UTF-8) might fit better for the "Base CCSID". However, "13488" is neither a code page nor an SBCS, so why is it in the "CodePage SBCS" column? This file maps ccsids to codeset names, but it does not define the underlying codesets; those are "held in conversion tables provided by IBM MQ".

tczielke wrote:

Only 1200 was being mapped to UCS-2. 1208 was supporting the full UTF-8 (1 to 4 byte encodings) at v8.

Does "supporting the full UTF-8" include support for data conversion? When MQ is not converting data, it's just transporting bytes, in which case it could "support" any 16-bit unsigned value as ccsid, or any sequence of bytes as message data, and let the sending and receiving applications worry about what the data means. MQ doesn't interpret or render characters; even when it's converting data, it is just mapping bytes to bytes, using either conversion tables, or conversion routines. Before MQ v9, the MQ documentation limited Unicode conversion support to "those Unicode characters that can be encoded in UCS-2". MQ v9 had to add support for converting supplementary characters:

Quote:

Data conversion to and from Unicode and CCSIDs 1388, 1390, 1399, 4933, 5488, and [16684?] has been extended, on some platforms, to support all the code points currently defined for these CCSIDs, including those that map to code points in Unicode supplementary planes. ... Support has also been added for conversion to and from Unicode and six new CCSIDs (1374 through to 1379).

Even if MQ v8 could convert "the full UTF-8", on some platforms, in some cases, that does not imply that MQ supported it.

tczielke wrote:

It's a low level bordering on pedantic point, but the md.CCSID != CDRA.CCSID. For example the CDRA.CCSID=1208 is UTF-8 with IBM PUA, and CDRA.CCSID=1209 is UTF-8 without the IBM PUA. So does that mean when I say md.CCSID=1208, it is UTF-8 with the IBM PUA data conversion fully supported? Probably not.

IBM PUA ("Private Use Area") refers to a range of code points, from U+F83D to U+F8FF, that IBM reserves for its own private use. In this case, "fully supported" means that IBM intends to use that area "whenever there is a need to maintain the round-trip integrity of IBM characters". Whether or not an MQ platform will use the IBM PUA for this purpose, applications can specify md.CCSID=1208 for UTF-8 anyway, to indicate that they won't use that area for any conflicting purpose. Or else applications can specify md.CCSID=1209 for UTF-8, to indicate that they will use their own PUA instead, to suit whatever private agreement they have between themselves, and then they can provide their own user-defined conversions to implement it.

tczielke wrote:

However, if I am looking into an MQ data conversion that involves a certain CCSID, I am not going to go the CDRA and expect IBM MQ to completely follow that CDRA.CCSID definition, because I have already found several exceptions where they don't.

Wouldn't it be enough for IBM MQ to just document any exceptions to the CDRA? What other alternatives are there?

tczielke

Posted: Wed Dec 13, 2017 2:39 pm Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

It is an assumption on my part that distributed MQ v8 was using a translation of CCSID 1208 to UTF-8/iconv on Linux. Even if it was, the point is well taken that it would have been difficult to leverage that for anything real useful, if CCSID 1200 was still being mapped to UCS-2/iconv.

One thing that I did find in some of my Unicode data conversion testing is that MQ v8 on z/OS did seem to support the full UTF-16 (surrogate pairs) and UTF-8 encodings. You could convert UTF-16 surrogate pairs to UTF-8, and vice versa on that platform at v8.

Again, my underlying point is that if I am working with MQ data conversion, I will leverage the CRDA.CCSID documentation, but not expect MQ to always follow it 100% completely. Instead MQ seems to be leveraging platform specific data conversion tools (like iconv) that could potentially deviate from the CDRA.CCSID documentation. An edge example would be the bytes x'C0' and x'C1' in CCSID 1208 that do not consistently convert the same on all IBM MQ platforms. Definitely an edge scenario however, for sure.
_________________
Working with MQ since 2010.

rekarm01

Posted: Mon Dec 18, 2017 9:11 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:

It is an assumption on my part that distributed MQ v8 was using a translation of CCSID 1208 to UTF-8/iconv on Linux. ...

MQ does not define the coded character sets that applications use to write or read messages. Sending applications create the message data, and convert it to bytes before ever putting a message on a queue, and conversely, receiving applications convert the bytes back to meaningful data after getting a message off a queue. Even when MQ is converting message data, the relevant conversion tables or routines need not fully represent their underlying character sets. So questions about which CCSID to use, or how to interpret PUA or other characters, may be better suited for the sending and receiving applications, not for MQ.

tczielke wrote:

One thing that I did find in some of my Unicode data conversion testing is that MQ v8 on z/OS did seem to support the full UTF-16 (surrogate pairs) and UTF-8 encodings. You could convert UTF-16 surrogate pairs to UTF-8, and vice versa on that platform at v8.

Unicode has algorithms for that, so it is easier to support the entire Unicode character set when converting from one Unicode transformation format to another, than when converting to/from a non-Unicode character set.

tczielke wrote:

Again, my underlying point is that if I am working with MQ data conversion, I will leverage the CRDA.CCSID documentation, but not expect MQ to always follow it 100% completely. Instead MQ seems to be leveraging platform specific data conversion tools (like iconv) that could potentially deviate from the CDRA.CCSID documentation.

Yes, it's too bad that MQ does not always conform to the CDRA, (for example, as previously discussed, it defines CCSIDs for UTF-16 differently), but that has nothing to do with iconv. iconv doesn't deal with CCSIDs or the CDRA.

tczielke wrote:

An edge example would be the bytes x'C0' and x'C1' in CCSID 1208 that do not consistently convert the same on all IBM MQ platforms. Definitely an edge scenario however, for sure.

Here, MQ conforms to the CDRA, but that doesn't mean that iconv conforms to Unicode. iconv should not try to interpret invalid bytes as part of a valid character, but it is free to handle them some other way. For example, it could signal an error and optionally abort the conversion, or else it could substitute some other character for the invalid bytes, or else it could just filter out the invalid bytes. So, conforming implementations can still behave differently, at least when handling invalid message data.

Display posts from previous:

Goto page Previous 1, 2

Page 2 of 2

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » Characters modified between two Queue Managers

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP