ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Mis-encoded case - Why no error is thrown?

Post new topic  Reply to topic Goto page 1, 2  Next
 Mis-encoded case - Why no error is thrown? « View previous topic :: View next topic » 
Author Message
ghoshly
PostPosted: Thu Mar 06, 2014 5:01 am    Post subject: Mis-encoded case - Why no error is thrown? Reply with quote

Partisan

Joined: 10 Jan 2008
Posts: 333

If source system specifies a character set / code page but sends character in the message data which is not in the same code page then also broker does not generates any exception.

Message Broker represents the same as '?' We can see that in Trace node output or MQ output.

Is there any specific reason why message broker is not generating exception in those case.

Example: Source System specifies ISO-8859-1 and sends character ฿ (Thai Curreny symbol Baht) U+0E3F which gets the representation as '?'
Back to top
View user's profile Send private message
ghoshly
PostPosted: Thu Mar 06, 2014 5:03 am    Post subject: WMB 8.0.0.2 WMQ 7.5.1 AIX 7.1 Reply with quote

Partisan

Joined: 10 Jan 2008
Posts: 333

Environment details - WMB 8.0.0.2 WMQ 7.5.1 AIX 7.1
Back to top
View user's profile Send private message
Tibor
PostPosted: Thu Mar 06, 2014 5:06 am    Post subject: Re: Mis-encoded case - Why no error is thrown? Reply with quote

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

ghoshly wrote:
Source System specifies ISO-8859-1 and sends character ฿ (Thai Curreny symbol Baht) U+0E3F which gets the representation as '?'

It is a little bit strange, because ISO-8859-1 / Latin-I has no representation for this character.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Mar 06, 2014 6:20 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

you may see the representation as a "?". I would make sure and check the hex value which could well be something else. Code page translation sometimes changes a character it has no representation for into a different character... (It's in the rules). The sad thing is that this may lead to an invalid XML document as the replacement character is not always valid XML...

Are you sure the broker's representation is wrong? Remember the broker uses UTF internally. It might be that the output is wrong because that character is not supported in the output CCSID... or it just might be that your display program cannot display the output correctly...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
smdavies99
PostPosted: Thu Mar 06, 2014 6:39 am    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

fjb_saper wrote:


Are you sure the broker's representation is wrong? Remember the broker uses UTF internally.


Shouldn't that be UNICODE?


_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
zpat
PostPosted: Thu Mar 06, 2014 6:44 am    Post subject: Reply with quote

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

UTF-16 is unicode, so is UTF-8 for that matter.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.
Back to top
View user's profile Send private message
ghoshly
PostPosted: Thu Mar 06, 2014 6:45 am    Post subject: Actual value / Representation Reply with quote

Partisan

Joined: 10 Jan 2008
Posts: 333

My idea is in the same line and that is why I mentioned about representation.

Question is : If there is mis-encoded scenario i.e. Source system is sending character of certain character set but mentions something different (Other than Unicode UTF-8 or 16), should Broker throw exception when it receives through any input node or writes to some output node? We do copy properties folder from input to output.

We do receive XML writing exception when we do not put character set values in output properties.

In some previous thread we heared from Kimbert that Broker internally uses UTF-16.
Back to top
View user's profile Send private message
ghoshly
PostPosted: Thu Mar 06, 2014 8:07 am    Post subject: Its not just representation :-( Reply with quote

Partisan

Joined: 10 Jan 2008
Posts: 333

Hello... I checked the output with File Output node as well.

If UTF-8 is used from source system, then we get the correct hex value in the file, i.e. E0B8BF

however if improper character set is mentioned in input we are getting 3F only which is hex value of ? character.
Back to top
View user's profile Send private message
kimbert
PostPosted: Thu Mar 06, 2014 12:12 pm    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Help! You guys really need to learn the basics about character encodings - and it is not hard! zpat is the only person to make a 100% correct statement so far on this thread.
Quote:
If source system specifies a character set / code page but sends character in the message data which is not in the same code page then also broker does not generates any exception.
Please explain exactly why you expected an exception. Please use Google to research ISO8859-1 and UTF-8 before you reply.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
fatherjack
PostPosted: Fri Mar 07, 2014 3:22 am    Post subject: Reply with quote

Knight

Joined: 14 Apr 2010
Posts: 522
Location: Craggy Island

kimbert wrote:
You guys really need to learn the basics about character encodings - and it is not hard!


Maybe not for someone with your experience and experitise in the subject, but given the number of threads that have vontinually appeared on this forum over the years then maybe it's a bit harder to grasp than you think. Is there a "Character Encoding for Dummies" anywhere?
_________________
Never let the facts get in the way of a good theory.
Back to top
View user's profile Send private message
ghoshly
PostPosted: Fri Mar 07, 2014 4:20 am    Post subject: Conversion Reply with quote

Partisan

Joined: 10 Jan 2008
Posts: 333

I can understand how in UTF-8 we are getting the hex values. What I do not understand is, when a character is not present in the incoming character set, how it is transformed / converted to '?'

For example I have tried with Shift-JIS and ISO-8859-1 to send ฿ character.

Is '?' is default character in such cases?

I am sorry and apologize for limited knowledge.
Back to top
View user's profile Send private message
Gralgrathor
PostPosted: Fri Mar 07, 2014 7:18 am    Post subject: Re: Conversion Reply with quote

Master

Joined: 23 Jul 2009
Posts: 297

ghoshly wrote:
I can understand how in UTF-8 we are getting the hex values. What I do not understand is, when a character is not present in the incoming character set, how it is transformed / converted to '?'

For example I have tried with Shift-JIS and ISO-8859-1 to send ฿ character.

Is '?' is default character in such cases?

I am sorry and apologize for limited knowledge.


Question: *IS* it converted to ?, or is that just the way your viewer displays the actual character? Does a hexdump of the bitstream show you the unicode hex for ?.?
_________________
A measure of wheat for a penny, and three measures of barley for a penny; and see thou hurt not the oil and the wine.
Back to top
View user's profile Send private message Send e-mail
Vitor
PostPosted: Fri Mar 07, 2014 7:37 am    Post subject: Re: Conversion Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

ghoshly wrote:
Is '?' is default character in such cases?


Either that or '.', depending on the software being used to view the data.

I echo comments of others that how the data is represented does not affect the underlying hex stream.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
kimbert
PostPosted: Fri Mar 07, 2014 9:14 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
Is there a "Character Encoding for Dummies" anywhere?
At risk of being accused of being a fanboy for this fella: http://www.joelonsoftware.com/articles/Unicode.html

Plus of course, Wikipedia, which has very good pages on Unicode, encodings and character sets.

The facts are:
- ISO-8859-1 is a single-byte encoding, and every byte value is a valid character. It is therefore impossible to get an 'invalid character' when *reading* ISO-8859-1. You might get unexpected characters, though. Especially if the bytes are actually representing UTF-8 and not ISO8859-1!
- ISO-8859-1 can represent exactly 256 characters. Unicode can represent a few million. So it's very easy to get 'Unconvertable character' errors when *writing* ISO-8859-1.

- UTF-8 can represent any character in the Unicode character set using between one and four bytes. It is therefore impossible to get an 'Unconvertable character' error when *writing* UTF-8.
- UTF-8 is exactly the same as ASCII ( and ISO-8859-* ) for the first 127 values. After that, characters are encoded as sequences of two or more bytes and those sequences *must* conform to the UTF-8 specification. So it's very easy to get 'Unconvertable character' errors when *reading* UTF-8 - especially if the input bytes are actually representing ISO-8859-1 characters in the 128-255 range. Although you still might get lucky if the sequence of characters happens to match a valid UTF-8 byte sequence. In that case you will just get the wrong characters.

It should be clear from the above that knowing the correct encoding ( same as CCSID ) is absolutely essential. It's quite possible to get incorrect results without realizing it.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
ghoshly
PostPosted: Mon Mar 10, 2014 1:15 am    Post subject: Thanks.. Reply with quote

Partisan

Joined: 10 Jan 2008
Posts: 333

Thanks a lot Kimbet & all for your response.

I do see the hex code '3F' when I can see '?' and that is the reason I mentioned about conversion.

I do use notepad++ or editplus for this message viewing purpose. Do you guys suggest any helpful tool for this purpose from your work experience, which has better capability for different encoding & character set?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Mis-encoded case - Why no error is thrown?
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.