Author |
Message
|
chetu777 |
Posted: Thu Mar 08, 2012 3:29 am Post subject: UTF 8 v/s UTF16 |
|
|
Acolyte
Joined: 07 Sep 2009 Posts: 59
|
Hi All,
My flow is contains MQInputNode -> Compute Node -> MQOutput Node. The input message expected is a Fixed length message of 66 characters coming from a source app(.net app).
The problem I am facing is that the message coming from source application is having a length of 132(instead of 66) and the input message is has UTF 16 format. input message contains special characters inbetween each character.
This message was not even recognised by message broker.
So after certain testing, the source app team changed the input message format from UTF16 to UTF 8 and due to this change the CCSID changed to 437 and special characters got removed automatically from the input message.
This message was recognised by broker and the message got processed successfully with this configuration.
My question is can anyone kindly let me know why the broker was not initally able to recognise the message when it was UTF 16 but it recognised when it was UTF 8?
Is that I can do any change from broker end so that the previous configuration message gets recognised and processed by broker? |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Mar 08, 2012 3:54 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
The only reason Broker would have had any issues processing the original message was because the original message was not actually constructed properly.
The problem had nothing to do with the actual bytes content of the message, but rather with how the message content was identified.
Broker internally transforms all character data into CCSID 1200 anyway.
The other possibility is that you constructed your message model to assume that one character == one byte. You can control this, and should indeed look into altering your message definition to treat all character lengths as characters and not as bytes.
The other question is to ask what it means that the message model asserts that characters are one byte long. Does it mean that the logical message really actually *requires* that for business or technical reasons? If so, then it was an illegal message to create it in UTF. If not, then the message model is wrong. |
|
Back to top |
|
 |
mapa |
Posted: Thu Mar 08, 2012 4:02 am Post subject: |
|
|
 Master
Joined: 09 Aug 2001 Posts: 257 Location: Malmö, Sweden
|
|
Back to top |
|
 |
kimbert |
Posted: Thu Mar 08, 2012 4:51 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Message broker is not the problem here. WMB is very, very good at handling character data in just about any encoding that you could imagine.
The possible reasons for the problem are:
a) your message set is designed to handle single-byte characters only
b) your message flow is designed to handle single-byte characters only
c) the sender is breaking the rules by sending multi-byte characters.
You *must* understand what the rules for this data format are. Are the characters single-byte, double-byte or multi-byte? Nobody on this forum can help with that. If you are free to agree some rules with the sender then do that.
Quote: |
the source app team changed the input message format from UTF16 to UTF 8 |
That sounds like a random change that just happened to work with a few test messages. Please read carefully what mqjeff said. UTF-8 characters are not single-byte characters. |
|
Back to top |
|
 |
mqsiuser |
Posted: Thu Mar 08, 2012 5:02 am Post subject: Re: UTF 8 v/s UTF16 |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
mqjeff wrote: |
Broker internally transforms all character data into CCSID 1200 anyway |
That is UTF-16... anything else wouldn't make sense and: Thank you for clarifying that mqjeff!
chetu777 wrote: |
My question is can anyone kindly let me know why the broker was not initally able to recognise the message when it was UTF 16 but it recognised when it was UTF 8? |
UTF is just "all the characters that are possible in the world". And there is (still(?)) emtpy space in the 65tsd possibilities.
This all referes to minimizing message size:
UTF 8 is kind of the typical thing you use in the US and Europe.
UTF-8 is "but I expect only the first 128 (charaters) to occur often" (and yes it is not single byte!) and
UTF-16 is "I expect a lot of different characters", like it would be the case in China (or if you are just thinking of your code or thing that you do "globally").
@OP: Broker is good at it, but you also need to understand it very well. Probably do not change the encoding on your input, but properly set the CCSID (and Encoding) ... but this may take some time. Also look here.
And very honestly... with .net you might really consider using XML and likely .net will just put in the encoding in the "xml-declaration", then you got rid of all your problems !
I think that flat-file is for legacy systems that can't do better (there are a lot and they are older than 10 years typically). _________________ Just use REFERENCEs |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Mar 08, 2012 8:27 am Post subject: Re: UTF 8 v/s UTF16 |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
mqsiuser wrote: |
And very honestly... with .net you might really consider using XML and likely .net will just put in the encoding in the "xml-declaration", then you got rid of all your problems !
|
Really? Since when did Microsoft release .Net runtimes for Solaris, AIX, Linux, z/OS etc. I must have missed the fanfare at announcement time.
Please do not assume that everyone is running on Windows.
From my experience of 30+ years of Systems Integration it actualy pays to develop your code to be as portable as possible and NOT assume anything especially about the accuracy of a message and its CCSID. They are sometimes very different despite the developer telling you otherwise. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
zpat |
Posted: Thu Mar 08, 2012 8:31 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Remember the old IND$FILE command? PC to mainframe File transfer.
Could never work out if IBM put the currency symbol in the name of the command deliberately or through ignorance of the non-US world! |
|
Back to top |
|
 |
mqsiuser |
Posted: Thu Mar 08, 2012 8:53 am Post subject: Re: UTF 8 v/s UTF16 |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
The OP uses Windows. Also he should invest into basic IT education (or ask an experienced collegue). @OP: With .net: Start with xml. It is easier than flat-file! I think that you think it is other way round... Or at least use ASCII (something where a char has 8 bit).
And if a partnering application includes the "xml declaration" then I will count on that (or broker sends the responsible person an error-eMail).
It's a contract.
If you/they cannot guarantee (the encoding) they'd not send it (the xml declaration)
Well there are excemptions. There is the real world, I agree to you
What do your partners do? Just indicate the encoding
We are heading for a long thread: Encodings  _________________ Just use REFERENCEs
Last edited by mqsiuser on Thu Mar 08, 2012 8:58 am; edited 1 time in total |
|
Back to top |
|
 |
Vitor |
Posted: Thu Mar 08, 2012 8:57 am Post subject: Re: UTF 8 v/s UTF16 |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqsiuser wrote: |
The OP uses Windows. |
Where does he say that? Where does he mention .NET? I agree it's likely that the unexpected arrival of a double byte message indicates both, but I don't see it stated. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqsiuser |
Posted: Thu Mar 08, 2012 9:01 am Post subject: Re: UTF 8 v/s UTF16 |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
Well .NET I dont know what to say. _________________ Just use REFERENCEs |
|
Back to top |
|
 |
Vitor |
Posted: Thu Mar 08, 2012 9:33 am Post subject: Re: UTF 8 v/s UTF16 |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqsiuser wrote: |
Well .NET I dont know what to say. |
I'd have gone with:
Quote: |
I think you missed the part in the original post where the OP talked about a .Net app |
as that's clearly the case here. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqsiuser |
Posted: Thu Mar 08, 2012 9:50 am Post subject: Re: UTF 8 v/s UTF16 |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
@OP: Learn at least as much about IT, so that you don't use .NET anymore. Also: Either quit your job or change the department. Install Ubuntu on your home computers and also look there into a "Terminal"  _________________ Just use REFERENCEs |
|
Back to top |
|
 |
mqsiuser |
Posted: Thu Mar 08, 2012 3:45 pm Post subject: |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
zpat wrote: |
Remember the old IND$FILE command? PC to mainframe File transfer. |
In know at least of "$", "!", "-", "." and "/" to be regular chars on some (os) systems.
For me regular (for code/coding/commands) are normally only numbers (0...9), the alphabet (a...z, A...Z) and underscore "_". I guess if that is not enough (for you) then they start adding these (strange ones).
On code-level there is no i18n: Europe uses what Amerika defined (once) ... You can be proud, also be responsible.
It is also IND$FILE in the Euro(€)-zone: It was deliberately and it is fine with me. I am ok with it... ... really ... For me its the "dollar sign" not just the currency  _________________ Just use REFERENCEs |
|
Back to top |
|
 |
rekarm01 |
Posted: Thu Mar 08, 2012 7:13 pm Post subject: Re: UTF-8 v/s UTF-16 |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
chetu777 wrote: |
The problem I am facing is that the message coming from source application is having a length of 132 ... (instead of 66) |
That's 132 characters (instead of 66 characters). Right?
chetu777 wrote: |
and the input message is has UTF 16 format. input message contains special characters inbetween each character. |
The most likely cause is that the source app provided the wrong input ccsid. What was the input ccsid?
chetu777 wrote: |
the source app team changed the input message format from UTF16 to UTF 8 and due to this change the CCSID changed to 437 |
Unfortunately, this ccsid is still wrong. It won't convert non-ASCII characters correctly.
chetu777 wrote: |
My question is can anyone kindly let me know why the broker was not initally able to recognise the message when it was UTF 16 but it recognised when it was UTF 8? Is that I can do any change from broker end so that the previous configuration message gets recognised and processed by broker? |
The source app provides both the input ccsid and the input data; it needs to make sure they match. The broker can't fix bad input.
mqsiuser wrote: |
And if a partnering application includes the "xml declaration" then I will count on that ... It's a contract. |
Don't count on that. It's not a contract. The broker looks for the input ccsid to determine the correct character encoding. It ignores any encoding information in the xml declaration.
mqjeff wrote: |
Broker internally transforms all character data into CCSID 1200 anyway. |
More precisely, the broker converts ccsid-encoded bytes to UCS-2 characters. |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Mar 08, 2012 10:53 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Post deleted. Too early in the morning for a rant.  _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|