MQSeries.net :: View topic - UTF 8 v/s UTF16

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » UTF 8 v/s UTF16

Goto page 1, 2 Next

UTF 8 v/s UTF16

« View previous topic :: View next topic »

Author

Message

chetu777

Posted: Thu Mar 08, 2012 3:29 am Post subject: UTF 8 v/s UTF16

Acolyte

Joined: 07 Sep 2009
Posts: 59

Hi All,

My flow is contains MQInputNode -> Compute Node -> MQOutput Node. The input message expected is a Fixed length message of 66 characters coming from a source app(.net app).

The problem I am facing is that the message coming from source application is having a length of 132(instead of 66) and the input message is has UTF 16 format. input message contains special characters inbetween each character.

This message was not even recognised by message broker.

So after certain testing, the source app team changed the input message format from UTF16 to UTF 8 and due to this change the CCSID changed to 437 and special characters got removed automatically from the input message.

This message was recognised by broker and the message got processed successfully with this configuration.

My question is can anyone kindly let me know why the broker was not initally able to recognise the message when it was UTF 16 but it recognised when it was UTF 8?
Is that I can do any change from broker end so that the previous configuration message gets recognised and processed by broker?

mqjeff

Posted: Thu Mar 08, 2012 3:54 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

The only reason Broker would have had any issues processing the original message was because the original message was not actually constructed properly.

The problem had nothing to do with the actual bytes content of the message, but rather with how the message content was identified.

Broker internally transforms all character data into CCSID 1200 anyway.

The other possibility is that you constructed your message model to assume that one character == one byte. You can control this, and should indeed look into altering your message definition to treat all character lengths as characters and not as bytes.

The other question is to ask what it means that the message model asserts that characters are one byte long. Does it mean that the logical message really actually *requires* that for business or technical reasons? If so, then it was an illegal message to create it in UTF. If not, then the message model is wrong.

mapa

Posted: Thu Mar 08, 2012 4:02 am Post subject:

Master

Joined: 09 Aug 2001
Posts: 257
Location: MalmÃ¶, Sweden

For fixed length fields in MRM you can choose between bytes and character.

An old but still interesting article regarding modelling message formats
http://www.ibm.com/developerworks/websphere/library/techarticles/0810_hanson/0810_hanson.html

kimbert

Posted: Thu Mar 08, 2012 4:51 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Message broker is not the problem here. WMB is very, very good at handling character data in just about any encoding that you could imagine.
The possible reasons for the problem are:
a) your message set is designed to handle single-byte characters only
b) your message flow is designed to handle single-byte characters only
c) the sender is breaking the rules by sending multi-byte characters.

You *must* understand what the rules for this data format are. Are the characters single-byte, double-byte or multi-byte? Nobody on this forum can help with that. If you are free to agree some rules with the sender then do that.

Quote:

the source app team changed the input message format from UTF16 to UTF 8

That sounds like a random change that just happened to work with a few test messages. Please read carefully what mqjeff said. UTF-8 characters are not single-byte characters.

mqsiuser

Posted: Thu Mar 08, 2012 5:02 am Post subject: Re: UTF 8 v/s UTF16

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

mqjeff wrote:

Broker internally transforms all character data into CCSID 1200 anyway

That is UTF-16... anything else wouldn't make sense and: Thank you for clarifying that mqjeff!

chetu777 wrote:

My question is can anyone kindly let me know why the broker was not initally able to recognise the message when it was UTF 16 but it recognised when it was UTF 8?

UTF is just "all the characters that are possible in the world". And there is (still(?)) emtpy space in the 65tsd possibilities.

This all referes to minimizing message size:

UTF 8 is kind of the typical thing you use in the US and Europe.

UTF-8 is "but I expect only the first 128 (charaters) to occur often" (and yes it is not single byte!) and

UTF-16 is "I expect a lot of different characters", like it would be the case in China (or if you are just thinking of your code or thing that you do "globally").

@OP: Broker is good at it, but you also need to understand it very well. Probably do not change the encoding on your input, but properly set the CCSID (and Encoding) ... but this may take some time. Also look here.

And very honestly... with .net you might really consider using XML and likely .net will just put in the encoding in the "xml-declaration", then you got rid of all your problems !

I think that flat-file is for legacy systems that can't do better (there are a lot and they are older than 10 years typically).
_________________
Just use REFERENCEs

smdavies99

Posted: Thu Mar 08, 2012 8:27 am Post subject: Re: UTF 8 v/s UTF16

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

mqsiuser wrote:

And very honestly... with .net you might really consider using XML and likely .net will just put in the encoding in the "xml-declaration", then you got rid of all your problems !

Really? Since when did Microsoft release .Net runtimes for Solaris, AIX, Linux, z/OS etc. I must have missed the fanfare at announcement time.

Please do not assume that everyone is running on Windows.
From my experience of 30+ years of Systems Integration it actualy pays to develop your code to be as portable as possible and NOT assume anything especially about the accuracy of a message and its CCSID. They are sometimes very different despite the developer telling you otherwise.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

zpat

Posted: Thu Mar 08, 2012 8:31 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

Remember the old IND$FILE command? PC to mainframe File transfer.

Could never work out if IBM put the currency symbol in the name of the command deliberately or through ignorance of the non-US world!

mqsiuser

Posted: Thu Mar 08, 2012 8:53 am Post subject: Re: UTF 8 v/s UTF16

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

The OP uses Windows. Also he should invest into basic IT education (or ask an experienced collegue). @OP: With .net: Start with xml. It is easier than flat-file! I think that you think it is other way round... Or at least use ASCII (something where a char has 8 bit).

And if a partnering application includes the "xml declaration" then I will count on that (or broker sends the responsible person an error-eMail).

It's a contract.

If you/they cannot guarantee (the encoding) they'd not send it (the xml declaration)

Well there are excemptions. There is the real world, I agree to you

What do your partners do? Just indicate the encoding

We are heading for a long thread: Encodings

_________________
Just use REFERENCEs

Last edited by mqsiuser on Thu Mar 08, 2012 8:58 am; edited 1 time in total

Vitor

Posted: Thu Mar 08, 2012 8:57 am Post subject: Re: UTF 8 v/s UTF16

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mqsiuser wrote:

The OP uses Windows.

Where does he say that? Where does he mention .NET? I agree it's likely that the unexpected arrival of a double byte message indicates both, but I don't see it stated.
_________________
Honesty is the best policy.
Insanity is the best defence.

mqsiuser

Posted: Thu Mar 08, 2012 9:01 am Post subject: Re: UTF 8 v/s UTF16

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

chetu777 wrote:

.net app

Well .NET I dont know what to say.
_________________
Just use REFERENCEs

Vitor

Posted: Thu Mar 08, 2012 9:33 am Post subject: Re: UTF 8 v/s UTF16

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mqsiuser wrote:

chetu777 wrote:

.net app

Well .NET I dont know what to say.

I'd have gone with:

Quote:

I think you missed the part in the original post where the OP talked about a .Net app

as that's clearly the case here.
_________________
Honesty is the best policy.
Insanity is the best defence.

mqsiuser

Posted: Thu Mar 08, 2012 9:50 am Post subject: Re: UTF 8 v/s UTF16

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

@OP: Learn at least as much about IT, so that you don't use .NET anymore. Also: Either quit your job or change the department. Install Ubuntu on your home computers and also look there into a "Terminal"

_________________
Just use REFERENCEs

mqsiuser

Posted: Thu Mar 08, 2012 3:45 pm Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

zpat wrote:

Remember the old IND$FILE command? PC to mainframe File transfer.

In know at least of "$", "!", "-", "." and "/" to be regular chars on some (os) systems.

For me regular (for code/coding/commands) are normally only numbers (0...9), the alphabet (a...z, A...Z) and underscore "_". I guess if that is not enough (for you) then they start adding these (strange ones).

On code-level there is no i18n: Europe uses what Amerika defined (once)

... You can be proud, also be responsible.

It is also IND$FILE in the Euro(â‚¬)-zone: It was deliberately and it is fine with me. I am ok with it...

... really

... For me its the "dollar sign" not just the currency

_________________
Just use REFERENCEs

rekarm01

Posted: Thu Mar 08, 2012 7:13 pm Post subject: Re: UTF-8 v/s UTF-16

Grand Master

Joined: 25 Jun 2008
Posts: 1415

chetu777 wrote:

The problem I am facing is that the message coming from source application is having a length of 132 ... (instead of 66)

That's 132 characters (instead of 66 characters). Right?

chetu777 wrote:

and the input message is has UTF 16 format. input message contains special characters inbetween each character.

The most likely cause is that the source app provided the wrong input ccsid. What was the input ccsid?

chetu777 wrote:

the source app team changed the input message format from UTF16 to UTF 8 and due to this change the CCSID changed to 437

Unfortunately, this ccsid is still wrong. It won't convert non-ASCII characters correctly.

chetu777 wrote:

My question is can anyone kindly let me know why the broker was not initally able to recognise the message when it was UTF 16 but it recognised when it was UTF 8? Is that I can do any change from broker end so that the previous configuration message gets recognised and processed by broker?

The source app provides both the input ccsid and the input data; it needs to make sure they match. The broker can't fix bad input.

mqsiuser wrote:

And if a partnering application includes the "xml declaration" then I will count on that ... It's a contract.

Don't count on that. It's not a contract. The broker looks for the input ccsid to determine the correct character encoding. It ignores any encoding information in the xml declaration.

mqjeff wrote:

Broker internally transforms all character data into CCSID 1200 anyway.

More precisely, the broker converts ccsid-encoded bytes to UCS-2 characters.

smdavies99

Posted: Thu Mar 08, 2012 10:53 pm Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Post deleted. Too early in the morning for a rant.

_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » UTF 8 v/s UTF16

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP