|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
Characters modified between two Queue Managers |
« View previous topic :: View next topic » |
Author |
Message
|
atedone |
Posted: Wed Nov 08, 2017 2:19 am Post subject: Characters modified between two Queue Managers |
|
|
Newbie
Joined: 31 Oct 2017 Posts: 5
|
Hi there!
I tried to search for CCSID/Encoding posts (as I suppose the issue is there), but I didn't find any useful hint.
Scenario: we are posting a message starting with the "£" pound symbol on a remote queue definition queue on a Unix machine, this is then routed to another queue manager (I don't know the OS but I can get it) that is receiving a different character, an "É" instead.
What should we do to avoid this happening?
We know that once in production, the messages will be transferred in binary mode (as many other flows already in production) with no issues, the problem now is that we are putting these messages manually, as we are in early test mode.
Thanks in advance for any hint from you gurus out there
Cheers
Testo |
|
Back to top |
|
|
zpat |
Posted: Wed Nov 08, 2017 3:04 am Post subject: |
|
|
Jedi Council
Joined: 19 May 2001 Posts: 5859 Location: UK
|
MQ does not convert characters unless you ask it to with CONVERT(YES) on a sender channel or MQGMO_CONVERT on a MQGET.
However what you perceive a character to be depends on how you view it.
What is the hex representation?
What is the CCSID of the QMs?
What is the CCSID id of the message?
How are you viewing this message?
If you unload the message to a file with MO71 unload (or dmpmqmsg or qload) you can see the original hex. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
|
atedone |
Posted: Wed Nov 08, 2017 3:51 am Post subject: I will dig into based on your questions... |
|
|
Newbie
Joined: 31 Oct 2017 Posts: 5
|
Thanks zpat for your reply and your questions, which are triggering some focused investigation.
Cheers
T |
|
Back to top |
|
|
gbaddeley |
Posted: Wed Nov 08, 2017 3:18 pm Post subject: |
|
|
Jedi Knight
Joined: 25 Mar 2003 Posts: 2527 Location: Melbourne, Australia
|
A common technique is to leave a message sitting on a local queue, and then use amqsbcg to browse the message. This will show the CCSID and Format in the MQMD, and the hex code representation of the message data. Check that a character hex code (in the CCSID) is the character that you are expecting. If its not, its an issue with the app that put the message. _________________ Glenn |
|
Back to top |
|
|
tczielke |
Posted: Thu Nov 09, 2017 6:20 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
Data conversion issues can be tricky to debug. It helps to run traces, track the CCSID/Encoding being used, and get to the byte level of the message data in the trace. However, I understand that this is an advanced thing to do. This MQ session below tries to provide some guidance on how to do that.
http://www.mqtechconference.com/sessions_v2016/MQTC_v2016_DataConversion.pdf _________________ Working with MQ since 2010. |
|
Back to top |
|
|
gbaddeley |
Posted: Thu Nov 09, 2017 3:49 pm Post subject: |
|
|
Jedi Knight
Joined: 25 Mar 2003 Posts: 2527 Location: Melbourne, Australia
|
Yes, trace can help. Using high level tools to view messages data as characters can be misleading.
1) The tool may or may not be converting the message.
2) The tool / terminal emulator / window might be displaying character glyphs in its own character set, which is different to the MQMD CCSID or the converted data.
Hex is best! _________________ Glenn |
|
Back to top |
|
|
atedone |
Posted: Fri Nov 10, 2017 2:19 am Post subject: Thanks a lot to all of you! |
|
|
Newbie
Joined: 31 Oct 2017 Posts: 5
|
I restarted recently to browse this forum (last time was in 2005) and it's a great pleasure to see that is still plenty of kind, collaborative and competent human being.
Thanks a lot for all your hints, they are proving to be useful!
Have a great day
T |
|
Back to top |
|
|
rekarm01 |
Posted: Wed Nov 15, 2017 8:48 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
tczielke wrote: |
Data conversion issues can be tricky to debug. It helps to run traces, track the CCSID/Encoding being used, and get to the byte level of the message data in the trace... |
I had not thought to use traces for that, (possibly because the applications in question were often running on remote servers I didn't have access to). I have mostly relied on message browsing tools, like "amqsbcg0", and other Unix utilities like "od" or "iconv".
IBM's "Data Conversion Under WebSphere MQ" document has a lot of useful information, but a few parts can be misleading, or inaccurate. The IBM Character Data Representation Architecture (CDRA) is another useful resource.
tczielke wrote: |
The Coded Character Set ID (CCSID) or Code Page is a table of assigning glyphs to a number |
No glyphs, just abstract characters, identified by label, either an IBM-specific "Graphic Character Global Id" (GCGID) with a short description, or a Unicode-based "Graphic Character UCS Id" (GCUID), or both. Glyphs, graphemes, character shapes, physical representations, or implied meanings of graphic characters are either non-normative, or outside the scope of the IBM CDRA.
And a Coded Character Set ID (CCSID) is more than just a code page; it describes, (among other things), one or more character set (CS) / code page (CP) pairs, to map between characters and non-negative integers (code points), and an encoding scheme (ES), to map between code points and physical bytes.
For example, Unicode has multiple transformation formats, so IBM provides multiple ccsids, with the same character set / code page pairs, but different encoding schemes. And there are two versions of the windows-1252 character encoding, (one with an added Euro character), so IBM provides two different ccsids, with the same code page, but different character sets:
Code: |
CCSID=1200 (UTF-16BE) CCSID=1208 (UTF-8)
- ES=7200 (UTF-16BE CES) - ES=7807 (UTF-8 CES)
- CS=65535 / CP=1400 (Plane 0: BMP) - CS=65535 / CP=1400 (Plane 0: BMP)
- CS=65535 / CP=1401 (Plane 1: SMP) - CS=65535 / CP=1401 (Plane 1: SMP)
- CS=65535 / CP=1402 (Plane 2: SIP) - CS=65535 / CP=1402 (Plane 2: SIP)
- CS=65535 / CP=1414 (Plane 14: SSP) - CS=65535 / CP=1414 (Plane 14: SSP)
- ... - ...
CCSID=1252 (MS Windows, Latin-1) CCSID=5348 (MS Windows, Latin-1, Version 2)
- ES=4105 - ES=4105
- CS=1402 / CP=1252 (Windows, Latin-1) - CS=1412 / CP=1252 (Windows, Latin-1 + euro) |
(Side note: IBM MQ handles UTF-16 endianness differently from the example above.)
tczielke wrote: |
Java PUT of String Message: For using IBM MQ Classes for Java, a Java String is encoded in UTF-16. Since the String has an [inherent] CCSID, ... |
Java Strings do not have an inherent CCSID. CCSIDs describe physical bytes, but they don't describe the hidden representation of abstract characters.
tczielke wrote: |
Java GET of String Message: //Unconverted GET and then Java converts from EBCDIC to UTF-8 and then from UTF-8 to UTF-16 |
The given example converts directly from EBCDIC to UTF-16; it does not convert to or from UTF-8. |
|
Back to top |
|
|
tczielke |
Posted: Thu Nov 16, 2017 6:04 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
rekarm01 wrote: |
tczielke wrote: |
Data conversion issues can be tricky to debug. It helps to run traces, track the CCSID/Encoding being used, and get to the byte level of the message data in the trace... |
I had not thought to use traces for that, (possibly because the applications in question were often running on remote servers I didn't have access to). I have mostly relied on message browsing tools, like "amqsbcg0", and other Unix utilities like "od" or "iconv".
IBM's "Data Conversion Under WebSphere MQ" document has a lot of useful information, but a few parts can be misleading, or inaccurate. The IBM Character Data Representation Architecture (CDRA) is another useful resource.
tczielke wrote: |
The Coded Character Set ID (CCSID) or Code Page is a table of assigning glyphs to a number |
No glyphs, just abstract characters, identified by label, either an IBM-specific "Graphic Character Global Id" (GCGID) with a short description, or a Unicode-based "Graphic Character UCS Id" (GCUID), or both. Glyphs, graphemes, character shapes, physical representations, or implied meanings of graphic characters are either non-normative, or outside the scope of the IBM CDRA.
And a Coded Character Set ID (CCSID) is more than just a code page; it describes, (among other things), one or more character set (CS) / code page (CP) pairs, to map between characters and non-negative integers (code points), and an encoding scheme (ES), to map between code points and physical bytes.
For example, Unicode has multiple transformation formats, so IBM provides multiple ccsids, with the same character set / code page pairs, but different encoding schemes. And there are two versions of the windows-1252 character encoding, (one with an added Euro character), so IBM provides two different ccsids, with the same code page, but different character sets:
Code: |
CCSID=1200 (UTF-16BE) CCSID=1208 (UTF-8)
- ES=7200 (UTF-16BE CES) - ES=7807 (UTF-8 CES)
- CS=65535 / CP=1400 (Plane 0: BMP) - CS=65535 / CP=1400 (Plane 0: BMP)
- CS=65535 / CP=1401 (Plane 1: SMP) - CS=65535 / CP=1401 (Plane 1: SMP)
- CS=65535 / CP=1402 (Plane 2: SIP) - CS=65535 / CP=1402 (Plane 2: SIP)
- CS=65535 / CP=1414 (Plane 14: SSP) - CS=65535 / CP=1414 (Plane 14: SSP)
- ... - ...
CCSID=1252 (MS Windows, Latin-1) CCSID=5348 (MS Windows, Latin-1, Version 2)
- ES=4105 - ES=4105
- CS=1402 / CP=1252 (Windows, Latin-1) - CS=1412 / CP=1252 (Windows, Latin-1 + euro) |
(Side note: IBM MQ handles UTF-16 endianness differently from the example above.)
tczielke wrote: |
Java PUT of String Message: For using IBM MQ Classes for Java, a Java String is encoded in UTF-16. Since the String has an [inherent] CCSID, ... |
Java Strings do not have an inherent CCSID. CCSIDs describe physical bytes, but they don't describe the hidden representation of abstract characters.
tczielke wrote: |
Java GET of String Message: //Unconverted GET and then Java converts from EBCDIC to UTF-8 and then from UTF-8 to UTF-16 |
The given example converts directly from EBCDIC to UTF-16; it does not convert to or from UTF-8. |
Thank you for the feedback! I will review this and adjust the presentation where appropriate, the next time I give it. For the "Java GET of String Message" issue that you pointed out, I coincidentally caught that earlier this week and it was corrected yesterday on the MQTC website.
Out of curiousity, did you help write the "Data Conversion Under WebSphere MQ" document? _________________ Working with MQ since 2010. |
|
Back to top |
|
|
gbaddeley |
Posted: Thu Nov 16, 2017 2:55 pm Post subject: |
|
|
Jedi Knight
Joined: 25 Mar 2003 Posts: 2527 Location: Melbourne, Australia
|
Another common issue is that the app constructs message data using a particular CCSID (usually the compiler / runtime native CCSID), but the queued message has a different effective value for CCSID in its MQMD.
I encountered this issue regularly when I used to support z/OS MQ. The app was using CCSID 37 internally, but messages were queued as CCSID 500 (the qmgrs default CCSID). There are a number of hex codes that have different character representations in these EBCDIC code sets. _________________ Glenn |
|
Back to top |
|
|
PeterPotkay |
Posted: Thu Nov 16, 2017 5:13 pm Post subject: |
|
|
Poobah
Joined: 15 May 2001 Posts: 7719
|
gbaddeley wrote: |
I encountered this issue regularly when I used to support z/OS MQ. The app was using CCSID 37 internally, but messages were queued as CCSID 500 (the qmgrs default CCSID). There are a number of hex codes that have different character representations in these EBCDIC code sets. |
Like ! and |
And Kia decided to name one of the trim level for their Kia Soul cars "!".
Why did you send KiaSoul|? We didn't, we sent KiaSoul!. No you didn't. Yes we did. No you didn't. Yes we did. Normally I advise to let the CCSID default on the MQPUT, but when dealing with mainframe apps running in an environment where there might be a mix of 037 and 500, I tell 'em to learn what code page their app runs as, and specify that in the MQMD CCSID when putting the message. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
|
rekarm01 |
Posted: Tue Nov 21, 2017 6:42 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
tczielke wrote: |
Out of curiousity, did you help write the "Data Conversion Under WebSphere MQ" document? |
Sorry, no. If I did help, I would have recommended rewriting the misleading or inaccurate bits.
gbaddeley wrote: |
Another common issue is that the app constructs message data using a particular CCSID (usually the compiler / runtime native CCSID), but the queued message has a different effective value for CCSID in its MQMD. |
For example, how many apps set the MQMD.CodedCharSetId to MQCCSI_Q_MGR for an MQPUT, rather than whichever CCSID represents the string data in the message, because that's what the "Data Conversion" doc instructed them to do? |
|
Back to top |
|
|
fjb_saper |
Posted: Wed Nov 22, 2017 5:57 am Post subject: |
|
|
Grand High Poobah
Joined: 18 Nov 2003 Posts: 20729 Location: LI,NY
|
rekarm01 wrote: |
tczielke wrote: |
Out of curiousity, did you help write the "Data Conversion Under WebSphere MQ" document? |
Sorry, no. If I did help, I would have recommended rewriting the misleading or inaccurate bits.
gbaddeley wrote: |
Another common issue is that the app constructs message data using a particular CCSID (usually the compiler / runtime native CCSID), but the queued message has a different effective value for CCSID in its MQMD. |
For example, how many apps set the MQMD.CodedCharSetId to MQCCSI_Q_MGR for an MQPUT, rather than whichever CCSID represents the string data in the message, because that's what the "Data Conversion" doc instructed them to do? |
That's because there are a number of assumptions behind that recommendation, none of which may be true:
- The queue manager runs with a CCSID representing the default CCISD of the platform.
- The compiler uses the default CCSID of the platform
- The program creates the message in the default CCSID of the platform
- The program writes the message in the default CCSID of the platform
Any of those being false and you are probably better off explicitly setting the CCSID of the message instead of using the qmgr default.
Have fun _________________ MQ & Broker admin |
|
Back to top |
|
|
tczielke |
Posted: Mon Nov 27, 2017 3:51 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
rekarm01 wrote: |
tczielke wrote: |
Data conversion issues can be tricky to debug. It helps to run traces, track the CCSID/Encoding being used, and get to the byte level of the message data in the trace... |
I had not thought to use traces for that, (possibly because the applications in question were often running on remote servers I didn't have access to). I have mostly relied on message browsing tools, like "amqsbcg0", and other Unix utilities like "od" or "iconv".
IBM's "Data Conversion Under WebSphere MQ" document has a lot of useful information, but a few parts can be misleading, or inaccurate. The IBM Character Data Representation Architecture (CDRA) is another useful resource.
tczielke wrote: |
The Coded Character Set ID (CCSID) or Code Page is a table of assigning glyphs to a number |
No glyphs, just abstract characters, identified by label, either an IBM-specific "Graphic Character Global Id" (GCGID) with a short description, or a Unicode-based "Graphic Character UCS Id" (GCUID), or both. Glyphs, graphemes, character shapes, physical representations, or implied meanings of graphic characters are either non-normative, or outside the scope of the IBM CDRA.
And a Coded Character Set ID (CCSID) is more than just a code page; it describes, (among other things), one or more character set (CS) / code page (CP) pairs, to map between characters and non-negative integers (code points), and an encoding scheme (ES), to map between code points and physical bytes.
For example, Unicode has multiple transformation formats, so IBM provides multiple ccsids, with the same character set / code page pairs, but different encoding schemes. And there are two versions of the windows-1252 character encoding, (one with an added Euro character), so IBM provides two different ccsids, with the same code page, but different character sets:
Code: |
CCSID=1200 (UTF-16BE) CCSID=1208 (UTF-8)
- ES=7200 (UTF-16BE CES) - ES=7807 (UTF-8 CES)
- CS=65535 / CP=1400 (Plane 0: BMP) - CS=65535 / CP=1400 (Plane 0: BMP)
- CS=65535 / CP=1401 (Plane 1: SMP) - CS=65535 / CP=1401 (Plane 1: SMP)
- CS=65535 / CP=1402 (Plane 2: SIP) - CS=65535 / CP=1402 (Plane 2: SIP)
- CS=65535 / CP=1414 (Plane 14: SSP) - CS=65535 / CP=1414 (Plane 14: SSP)
- ... - ...
CCSID=1252 (MS Windows, Latin-1) CCSID=5348 (MS Windows, Latin-1, Version 2)
- ES=4105 - ES=4105
- CS=1402 / CP=1252 (Windows, Latin-1) - CS=1412 / CP=1252 (Windows, Latin-1 + euro) |
(Side note: IBM MQ handles UTF-16 endianness differently from the example above.)
tczielke wrote: |
Java PUT of String Message: For using IBM MQ Classes for Java, a Java String is encoded in UTF-16. Since the String has an [inherent] CCSID, ... |
Java Strings do not have an inherent CCSID. CCSIDs describe physical bytes, but they don't describe the hidden representation of abstract characters.
tczielke wrote: |
Java GET of String Message: //Unconverted GET and then Java converts from EBCDIC to UTF-8 and then from UTF-8 to UTF-16 |
The given example converts directly from EBCDIC to UTF-16; it does not convert to or from UTF-8. |
A lot of very helpful information here, especially pointing me to the CRDA. That was the piece I was missing to better understand IBM CCSIDs. I am still digesting all of this, but I would say that the MQ CCSID (found in the message descriptor) is not the same thing as the IBM CCSID that is pointed to in those links. MQ seems to have a much looser interpretation of the IBM CCSID (e.g. MQ CCSID 1200 which collapses several IBM CCSIDs into it). You almost need two different terms here, IBM CCSID and MQ CCSID. It makes me question how well you can go to the IBM CCSID definition and then expect the MQ CCSID equivalent to competely follow it. You can't for at least 1200. _________________ Working with MQ since 2010. |
|
Back to top |
|
|
fjb_saper |
Posted: Mon Nov 27, 2017 7:49 pm Post subject: |
|
|
Grand High Poobah
Joined: 18 Nov 2003 Posts: 20729 Location: LI,NY
|
tczielke wrote: |
MQ seems to have a much looser interpretation of the IBM CCSID (e.g. MQ CCSID 1200 which collapses several IBM CCSIDs into it). You almost need two different terms here, IBM CCSID and MQ CCSID. It makes me question how well you can go to the IBM CCSID definition and then expect the MQ CCSID equivalent to competely follow it. You can't for at least 1200. |
When talking about CCSID 1200 here are you talking loosely (including CCSID 1201 and 1202) or strictly (CCSID 1200 only)? _________________ MQ & Broker admin |
|
Back to top |
|
|
|
|
|
Goto page 1, 2 Next |
Page 1 of 2 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|