Author |
Message
|
busy_chap |
Posted: Thu Oct 14, 2010 11:57 am Post subject: Code page problem (again) |
|
|
Acolyte
Joined: 18 May 2006 Posts: 69
|
Hello,
I have been struggling with this for days and cannot get the correct result so I am posting my issue here.
Source sends us Polish data żźńćś and I am supposed to send this to the output queue using correct code page.
I tried 1208 (UTF- the characters on the output q are żźńćśąę
I tried 1250 (Windows-1250) the chars on the output q are ¿Ÿñ朹ê³ó
I tried 912 (ISO-8859-2) the chars on the output q are ¿¼ñæ¶±ê
Can anybody please help me with my issue. I am vieweing the data through RFHUTIL and initially i thought , may be rfhutil was unable to display them fine but since xml spy or internet explorer was able to show the data.. it worked fine. But that is not true because the data when loaded to polish database still shows up as the corrupt characters as above. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Oct 14, 2010 12:06 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
So this is a follow up to this?
busy_chap wrote: |
Source sends us Polish data żźńćś and I am supposed to send this to the output queue using correct code page.
I tried 1208 (UTF- the characters on the output q are żźńćśąę
I tried 1250 (Windows-1250) the chars on the output q are ¿Ÿñ朹ê³ó
I tried 912 (ISO-8859-2) the chars on the output q are ¿¼ñæ¶±ê |
With this kind of code page problem, I can do no better than repeat the advice of Yoda:
Quote: |
Do. Or do not. There is no try. |
Don't experiment with code pages & hope you hit the right one. It's a long and rocky road (with no ice cream).
Obvious question - what's the code page this polish database is using? I'd have expected this to be the one your output needs to be in. \
Then it's a simple question of moving from whatever the source message is coded in to that code page. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
busy_chap |
Posted: Thu Oct 14, 2010 12:37 pm Post subject: |
|
|
Acolyte
Joined: 18 May 2006 Posts: 69
|
Thanks vitor for your quick response.
The country do not know how to look for the codepage on their end. But they did give me the database collation Polish_100_CI_AS and I didn't know how to relate it to a codepage.. so i kept on trying all of them available on the web but nothing helps |
|
Back to top |
|
 |
Vitor |
Posted: Thu Oct 14, 2010 1:16 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
busy_chap wrote: |
The country do not know how to look for the codepage on their end. |
How does the source data arrive? If it's by file it's the code page of the OS; if it's by WMQ it's the CCSID in the MQMD. If either of these doesn't match the actual code page of the data you're toast.
busy_chap wrote: |
But they did give me the database collation Polish_100_CI_AS and I didn't know how to relate it to a codepage |
I don't think it does; collation to me means sort order.
busy_chap wrote: |
.. so i kept on trying all of them available on the web but nothing helps |
No, it never does. As I said above. Code pages are a pain. I suffer with you (spiritually). _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
busy_chap |
Posted: Thu Oct 14, 2010 1:29 pm Post subject: |
|
|
Acolyte
Joined: 18 May 2006 Posts: 69
|
you have asked a question about one of my suspicions...
The source application directly puts a message through a API call and one other interesting ( or frustrating) point with this is that...
The source app logs show that the data they are sending to my queue has żźńćśąęłÃ³ in the data but when it comes to queue on AIX on my end and when I browse the data on the input queue through RFHUTIL , I see this żźńćśąęłó but with codepage 1208.
I looked at one of the posts here and found that RFHUTIL does not display UTF data and I opened the data with XML spy , the data looked good with polish chars so I assumed the data is good. We processed the data with żźńć and the output q also has żźńć but this is not what the destination wants. They want żźńćśąęłÃ³
Since RFHUTIL is unable to show żźńćś on the input queue , does that mean the data published by the app is bad? Or it's still good data but I am unable to convert it to regular Polish chars as I am not using the right codepage?
This question has been bothering me for a ling time but I couldn't find any evidence to go back to the source app team as they claim they are sending Polish data |
|
Back to top |
|
 |
Vitor |
Posted: Thu Oct 14, 2010 1:33 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
busy_chap wrote: |
The source app logs show that the data they are sending to my queue has żźńćśąęłÃ³ in the data |
So they're sending to a queue? What do they set in the CCSID of the MQMD and/or what is the CCSID of their queue manager?
busy_chap wrote: |
but when it comes to queue on AIX on my end and when I browse the data on the input queue through RFHUTIL , I see this żźńćśąęłó but with codepage 1208. |
What is the CCSID of this queue manager? Does the channel do conversion? Is RFHUtil set to do conversion?
What (if anything) coverts the data before it's put to the database?
Methodical checking of every step is the only way through this sort of thing. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
busy_chap |
Posted: Thu Oct 14, 2010 1:43 pm Post subject: |
|
|
Acolyte
Joined: 18 May 2006 Posts: 69
|
1.The source app sets CCSID of MQMD as 1208. They don't have a queue manager. They directly put a message to my queue through a Java call.
2.CCSID of the queue manager on my end which is AIX is 819 and it is the broker server. Channel does not do any conversion.
3.I am not sure what you mean by is RFHUTIL Set to do any conversion?
There is no other conversion other than what broker does to handle polish chars to send them to the output queue.
I am stuck on why do I see the żźńćśąęłó on the input q where as the source app logs show that they are sending out Polich chars and whatever is available on the input q is processed to the output queue which also has żźńćśąęłó |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Oct 14, 2010 2:04 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
busy_chap wrote: |
1.The source app sets CCSID of MQMD as 1208. They don't have a queue manager. They directly put a message to my queue through a Java call.
2.CCSID of the queue manager on my end which is AIX is 819 and it is the broker server. Channel does not do any conversion.
3.I am not sure what you mean by is RFHUTIL Set to do any conversion?
There is no other conversion other than what broker does to handle polish chars to send them to the output queue.
I am stuck on why do I see the żźńćśąęłó on the input q where as the source app logs show that they are sending out Polich chars and whatever is available on the input q is processed to the output queue which also has żźńćśąęłó |
What are you using to read the input queue? Have you verified at hex level?
Can your reader handle multiple charsets?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
kimbert |
Posted: Thu Oct 14, 2010 2:06 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
This is not hard. It is boring, but not hard. The boring part is that you have to carefully work through each stage, checking that two things are consistent with each other. The two things are:
1) the encoding or CCSID which that stage is claiming to use
2) the actual bytes of the data ( not the characters displayed by a text viewer like RFH2, which can get the decoding wrong ). Doing this will force you to understand how characters are encoded as sequences of bytes.
So, to reinforce Vitor's point, the first thing to check is that the sending application really is sending data in CCSID 1208 ( UTF-8 ). Check the bytes of the message, and convince yourself that it is a well-formed UTF-8 byte stream.
One way to check the bytes of the message is to temporarily change the domain to BLOB on the input node, and then inspect the message contents using a Trace node. There may be easier ways.
Secondly, you can check that your input node has decoded the characters correctly. A Trace node will do that for you.
Thirdly, your message flow can output those logical characters in the same code page, or some other code page, and you can then check that the output byte stream is correct and consistent. Or you can leave that as an exercise for the maintainers of the receiving application.  |
|
Back to top |
|
 |
rekarm01 |
Posted: Sat Oct 16, 2010 6:12 pm Post subject: Re: Code page problem (again) |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
A message on a queue consists of an MQMD header, followed by a stream of bytes. The target application depends on the given MQMD header to convert an otherwise meaningless stream of bytes into more meaningful information, (such as a stream of characters).
busy_chap wrote: |
Source sends us Polish data żźńćś; ... I am viewing the data through RFHUTIL and initially i thought, may be rfhutil was unable to display them fine |
RfhUtil might not display character data correctly, as it uses a default code page, (such as Windows-1252), instead of the MQMD header, to display message data. But RfhUtil also provides the option to display message data in other formats, such as characters, hex (bytes), or both.
To confirm that the message data is correct, check the bytes, not the characters:
Code: |
X'C5 BC C5 BA C5 84 C4 87 C5 9B C4 85 C4 99 C5 82 C3 B3' -> ccsid(1208) -> 'żźńćśąęłÃ³'
X'C5 BC C5 BA C5 84 C4 87 C5 9B C4 85 C4 99 C5 82 C3 B3' -> ccsid(1252) -> 'żźńćśąęłó'
X'BF 9F F1 E6 9C B9 EA B3 F3' -> ccsid(1250) -> 'żźńćśąęłÃ³'
X'BF 9F F1 E6 9C B9 EA B3 F3' -> ccsid(1252) -> '¿Ÿñ朹ê³ó'
X'BF BC F1 E6 B6 B1 EA B3 F3' -> ccsid(912) -> 'żźńćśąęłÃ³'
X'BF BC F1 E6 B6 B1 EA B3 F3' -> ccsid(1252) -> '¿¼ñæ¶±ê³ó' |
As long as the given ccsid maps the given bytes to the expected characters, then the message data on the queue is correct, whether RfhUtil displays it correctly or not.
busy_chap wrote: |
... it worked fine. But that is not true because the data when loaded to polish database still shows up as the corrupt characters as above. |
Something, somehow, moved the data from a queue into some brand of database, possibly corrupting the data in the process? Please provide more details.
Assuming that it's a broker message flow using an ODBC driver to connect to a Microsoft SQL Server database, confirm that the ODBC driver configuration and database configuration are consistent with the broker documentation. The broker and database driver can then convert message data as needed, automatically.
busychap wrote: |
... they did give me the database collation Polish_100_CI_AS and I didn't know how to relate it to a codepage |
This is a Microsoft SQL Server Windows Collation Name. Refer to the MSDN documentation, to identify the associated non-Unicode code page. |
|
Back to top |
|
 |
|