Author |
Message
|
saurabh867 |
Posted: Fri Aug 06, 2010 1:07 am Post subject: Handling UTF- characters in message flow |
|
|
Voyager
Joined: 13 Jun 2010 Posts: 78
|
Hi,
I have a message flow which reads a file and then parse it against a certain message set. The file contains some UTF-8 characters like ® symbol.
After all my processing the output message contains a special character before the ® character i put in the input. I am also storing the entire message in the database as part of requirement and there also I could see a junk value  preceeding ® character.
I have used CCSID as 1208 and encoding as 437. Is there any way to handle this scenario.
Regards,
Saurabh |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Aug 06, 2010 1:15 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Depends is UTF-8 available as a character set on your platform (see DB requirements for sender)? Is the file data defined to the broker in the correct ccsid before parsing? And please do not use read with Convert, as this will potentially downgrade the ccsid to the ccsid of the qmgr. Use the CCSID on InputRoot.Properties.CodedCharSet...
And remember in MQ the encoding has to do with endian recognition of binary numbers and nothing to do with the character set.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
saurabh867 |
Posted: Fri Aug 06, 2010 1:33 am Post subject: |
|
|
Voyager
Joined: 13 Jun 2010 Posts: 78
|
Yes,
I have used CCSID for setting Properties only . I agree that data in DB may depend on the platform but the data in output queue should show the correct character. Why everytime for each character, it is preceeding the same character Â. |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Aug 06, 2010 1:47 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
saurabh867 wrote: |
Yes,
I have used CCSID for setting Properties only . I agree that data in DB may depend on the platform but the data in output queue should show the correct character. Why everytime for each character, it is preceeding the same character Â. |
Did you look at the message (hex data + ccsid) on the queue before it gets consumed by the broker? Does the data match the ccsid?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
saurabh867 |
Posted: Fri Aug 06, 2010 4:04 am Post subject: |
|
|
Voyager
Joined: 13 Jun 2010 Posts: 78
|
Actually, the issue comes when I send my data to WTX node from broker then the output of TX node contains an extra junk character for every special character.
But the same input works fine when I run the file independently with TX map.
Any idea how could this happen? |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Aug 06, 2010 4:39 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
It's almost certainly not a junk character.
it's almost certainly one half of a double-byte UTF-8 character. You see that word "double"? it means "two". So certain UTF-8 characters take *two* bytes, not *one* to define. So if you are looking at the data using something that presents every byte as a single character, you will see this result.
If you are only having issues because you are *seeing* this character, but you are able to *process* the file correctly, then there is no bug at all.
If you are having issues PROCESSING this file, then it is because something in the message flow is not properly indicating the correct CCSID for the data, and so the data is being serialized incorrectly.
Take a user trace. Pay very close attention to everything you do with CCSID, including what the CCSID on the message is when it goes into WTX and when it comes out of WTX. |
|
Back to top |
|
 |
saurabh867 |
Posted: Fri Aug 06, 2010 5:18 am Post subject: |
|
|
Voyager
Joined: 13 Jun 2010 Posts: 78
|
You have a point coz I tried comparing the value of the element conataining extra character (one half ) against the original value without extra character and it passed the condition.
So does that mean the data is correct and it is just the represntaion of the data is not prooperly visible. In this case, is there any way to see the correct data and do I need to contact my DBA as I am putting my message in database and it has that extra character which does not look good from the end user perspective.
Regards,
Saurabh |
|
Back to top |
|
 |
kimbert |
Posted: Fri Aug 06, 2010 7:24 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
So does that mean the data is correct and it is just the represntaion of the data is not prooperly visible? |
You need to answer that question for yourself. If you cannot, then you need to learn how. Please see this article for reasons why I believe this:
http://www.joelonsoftware.com/articles/Unicode.html |
|
Back to top |
|
 |
saurabh867 |
Posted: Sun Aug 08, 2010 8:53 pm Post subject: |
|
|
Voyager
Joined: 13 Jun 2010 Posts: 78
|
Thanks Kimbert,
I did answer the question and yes this article is a must read for an understanding of encoding.
Regards,
Saurabh |
|
Back to top |
|
 |
|