ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexWebSphere Message Broker SupportIIB 9 Unicode and UTF-8 support clarification needed

Post new topicReply to topic Goto page 1, 2  Next
IIB 9 Unicode and UTF-8 support clarification needed View previous topic :: View next topic
Author Message
marko.pitkanen
PostPosted: Tue Nov 18, 2014 1:13 am Post subject: IIB 9 Unicode and UTF-8 support clarification needed Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi All,

I didn't do a proper investigation if this subject have already been covered here. If so please feel free to give pointer to the appropriate thread.

Question is what characters / languages are supported by the broker
through MQ, broker wide HTTP -listener and file nodes? This is not obvious from the documentation because for example:

In the documentation page for IIB9 it says that for example UTF-8 has built-in support (no restrictions mentioned).

In the document for MQ Unicode Conversion support it says that the support for UTF-16 and UTF-8 in WebSphere MQ is limited to those Unicode characters that can be encoded in UCS-2.

Are there any restrictions which characters can be used in the application messages processed by / with IIB 9?

--
Marko
Back to top
View user's profile Send private message Visit poster's website
smdavies99
PostPosted: Tue Nov 18, 2014 1:28 am Post subject: Re: IIB 9 Unicode and UTF-8 support clarification needed Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6077
Location: Somewhere over the Rainbow this side of Never-never land.

marko.pitkanen wrote:

Are there any restrictions which characters can be used in the application messages processed by / with IIB 9?


AFAIK, no there aren't.

BUT there is a big IF though. The IF is reflected by my guess that several hundred posts here about this sort of thing.

The biggest issue I've seen is the fact that the CCSID of the contents differ from the CCSID in the message header/descriptor.
This causes no end of problems.
For example

It is no use having
Code:

<?xml version="1.0" encoding="ISO8859-1 ?>

When the rest of the XML data is actually UTF-8 encoded.

If is no use having an MQMD.CodeCharSetId=1208 then the message body is coded as 923.

Finally, there are times when we see CCSID Conversion done by a channel when it need not have been done.

It is surprising how many architects and developers simply don't understand this.
I first started woking on multinational Character sets around 1981 and there are still times when I get it wrong.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Tue Nov 18, 2014 1:46 am Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks,

In theory how would for example the 4-byte UTF-8 Kanji characters
Code:
Normal Kanji ==> UTF-8 octets: 3 bytes ==> UTF-8 code point: up to 0xFFFF (2 bytes)
Rare Kanji   ==> UTF-8 octets: 4 bytes ==> UTF-8 code point: above 0xFFFF (4 bytes)
work in IIB's http or file interface?

--
Marko
Back to top
View user's profile Send private message Visit poster's website
smdavies99
PostPosted: Tue Nov 18, 2014 2:45 am Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6077
Location: Somewhere over the Rainbow this side of Never-never land.

The Kanji character are not UTF-8, they are UTF-16 or UTF-32

The UTF stream uses BOM's (Byte Order Marks)
http://en.wikipedia.org/wiki/Byte_order_mark

to switch between the different types. Remember to get the endian correct though.

Why don't you try it for yourself?
Look at the message tree
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
Vitor
PostPosted: Tue Nov 18, 2014 6:20 am Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 25995
Location: Texas, USA

marko.pitkanen wrote:
In theory how would for example the 4-byte UTF-8 Kanji characters
Code:
Normal Kanji ==> UTF-8 octets: 3 bytes ==> UTF-8 code point: up to 0xFFFF (2 bytes)
Rare Kanji   ==> UTF-8 octets: 4 bytes ==> UTF-8 code point: above 0xFFFF (4 bytes)
work in IIB's http or file interface?


They would (like all data) be passed into IIB over http from a system that supported those code points or read from a file on an OS that supported those code point, and then stored in the IIB message tree as UTF-16.

If you then tried to serialise the data into an http put or write it to a file using a CCSID that doesn't support all the code points which are in the message tree at the time of serialisation (and note that this CCSID has no connection with the CCSID of the inbound data) then IIB will abend in the traditional manner & roll back.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Tue Nov 18, 2014 6:51 am Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks,

We will try to produce a problem. Question is rather theoretical, just to make sure that the same restrictions as for MQ
Code:
The support for UTF-16 and UTF-8 in WebSphere MQ is therefore limited to those Unicode characters that can be encoded in UCS-2.

is or isn't true for the broker.

--
Marko
Back to top
View user's profile Send private message Visit poster's website
Vitor
PostPosted: Tue Nov 18, 2014 7:01 am Post subject: Re: IIB 9 Unicode and UTF-8 support clarification needed Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 25995
Location: Texas, USA

marko.pitkanen wrote:
In the documentation page for IIB9 it says that for example UTF-8 has built-in support (no restrictions mentioned).


It also lists all of the UTF-16, UTF-32 and some ISO I've never heard of.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
kimbert
PostPosted: Tue Nov 18, 2014 8:10 am Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

WMB and IIB can handle all of Unicode. No restrictions, no qualifications.
In particular
- it can handle any valid character in the UTF-8 and UTF-16 encodings, regardless of the number of bytes that it occupies
- it can handle any valid character in UTF-32
- it is not limited to the range of characters in UCS-2. That would restrict the product to the Basic Multilingual Plane (BMP) which would be very limiting.

smdavies99 said:
Quote:
The Kanji character are not UTF-8, they are UTF-16 or UTF-32
That statement could be misinterpreted.

UTF-8, UTF-16 and UTF-32 encode *all* of Unicode. So any character that is valid in one of those encodings is also valid in both of the others. But the character will be *encoded* differently ( it will be represented by a different byte sequence) in each case.

Kanji characters, and any other characters outside of the BMP, will be decoded from their original code page/encoding and will appear in the message tree as a UTF-16 'surrogate pair'. Note that the original code page does not need to be UTF-8 or UTF-16; the character could come in as Shift-JIS or some other non-Unicode encoding.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Tue Nov 18, 2014 10:35 am Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks everyone for your input.

--
Marko
Back to top
View user's profile Send private message Visit poster's website
rekarm01
PostPosted: Wed Nov 19, 2014 5:19 am Post subject: Re: IIB 9 Unicode and UTF-8 support clarification needed Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1388

marko.pitkanen wrote:
In the documentation page for IIB9 it says that for example UTF-8 has built-in support (no restrictions mentioned).

In the document for MQ Unicode Conversion support it says that the support for UTF-16 and UTF-8 in WebSphere MQ is limited to those Unicode characters that can be encoded in UCS-2.

WMB/IIB supports conversion for the entire Unicode character set, but WMQ does not. WMQ only supports conversion for the UCS-2 subset.

kimbert wrote:
WMB and IIB can handle all of Unicode. No restrictions, no qualifications.

Maybe one little qualification for the IIB's MQRFH2 header parser? Specifically for the NameValueCCSID field, does the value 1200 indicate UCS-2, or UTF-16?

kimbert wrote:
In particular
- ... it is not limited to the range of characters in UCS-2 ...

Then at least some of the WMB/IIB documentation may still be out of date, where it refers to the broker/bus using UCS-2 internally.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Jun 04, 2020 4:17 am Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7612

kimbert wrote:
WMB and IIB can handle all of Unicode. No restrictions, no qualifications.
In particular
- it can handle any valid character in the UTF-8 and UTF-16 encodings, regardless of the number of bytes that it occupies
- it can handle any valid character in UTF-32
- it is not limited to the range of characters in UCS-2. That would restrict the product to the Basic Multilingual Plane (BMP) which would be very limiting.


6 years later the IIB 10.0.0.20 KC says:
Quote:
Integration nodes complete string operations in Universal Character Set coded in 2 octets (UCS-2). If incoming strings are not encoded in UCS-2, they are converted to UCS-2 on arrival.

https://www.ibm.com/support/knowledgecenter/SSMKHH_10.0.0/com.ibm.etools.mft.doc/ac30180_.html

If IIB is not limited to the range of characters in UCS-2, how does it deal with characters outside that range given the above reference?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
rekarm01
PostPosted: Thu Jun 04, 2020 4:14 pm Post subject: Re: IIB 9 Unicode and UTF-8 support clarification needed Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1388

PeterPotkay wrote:
6 years later the IIB 10.0.0.20 KC says:
Quote:
Integration nodes complete string operations in Universal Character Set coded in 2 octets (UCS-2). If incoming strings are not encoded in UCS-2, they are converted to UCS-2 on arrival.

Some of the IIB/ACE documentation may still be out of date, where it refers to "UCS-2". The term itself has been obsolete for a while now, according to recent versions of the Unicode standard:

Quote:
UCS-2 ... was documented in earlier editions of [ISO/IEC] 10646 ... This documentation has been removed from ISO/IEC 10646:2011 and subsequent editions ... It no longer refers to an encoding form in either 10646 or the Unicode Standard.

The documentation should refer to "UTF-16" instead, and should indicate in some other way where it might only support a subset of the Unicode Character Set.

PeterPotkay wrote:
If IIB is not limited to the range of characters in UCS-2, how does it deal with characters outside that range given the above reference?

If the IIB were still using UCS-2, then it would probably have to throw a conversion Exception when trying to convert characters outside that range.
Back to top
View user's profile Send private message
timber
PostPosted: Fri Jun 05, 2020 1:44 am Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1120

Just to confirm (as if there was any doubt)...that rekarm01 is correct. All character data in the IIB/ACE message tree is in UTF-16, not UCS-2. All Unicode characters can be represented in the message tree - no restrictions at all.

In an ideal world, IBM would correct that page in the Knowledge Center.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Fri Jun 05, 2020 8:35 am Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7612

Thanks guys, I thought I remember reading / hearing the broker uses UTF-16 internally, just couldn't find anything official in the KC.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
rekarm01
PostPosted: Fri Jun 05, 2020 3:49 pm Post subject: Re: IIB 9 Unicode and UTF-8 support clarification needed Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1388

marko.pitkanen wrote:
In the document for MQ Unicode Conversion support it says that the support for UTF-16 and UTF-8 in WebSphere MQ is limited to those Unicode characters that can be encoded in UCS-2.

IBM MQ v9.0 or later can now handle all of Unicode too, no restrictions:

Quote:
Before Version 9.0, previous versions of the product did not support conversion of data containing Unicode code points beyond the Basic Multilingual Plane (code points above U+FFFF) ... From Version 9.0, IBM MQ supports all Unicode characters ...
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum IndexWebSphere Message Broker SupportIIB 9 Unicode and UTF-8 support clarification needed
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.