Author |
Message
|
Raya76 |
Posted: Fri Jan 17, 2014 8:45 pm Post subject: Extended ASCII char - data conversion issue |
|
|
Newbie
Joined: 17 Jan 2014 Posts: 3
|
Hello.
Need lil help with extended ASCII character ( example: R with 2 dots on top of it)
At mq input node receiving blob msg and parsing the data using MRM parser. Data is fixed length 450bytes. One of the text field receiving that kind of data.
When I parse the msg using MRM msp no issue at all, when I put that msg to output Q I see just 449 bytes.
When I debug that msg that character I see as " ? ".
When I converted that input data to bitstream I used ccsid 1208 and encoding MQENC_NATIVE.
Is there a way to handle that input msg extended ASCII char( Hex9f).
Thanks |
|
Back to top |
|
 |
mqsiuser |
Posted: Sat Jan 18, 2014 12:07 am Post subject: |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
You really need to find out the proper code page, that the message uses. Open the message in Notepad++ and in the menue Notepad++ will display the code page it *thinks* it is (and it applies that to display the characters). Better though: Ask and find out at the source.
"?" in the msgs means you are using a wrong codepage. It means there is no character for the bits/bytes for that code page.
And there is much more than 7bit ASCII and extendend (8bit) ASCII.
I didn't find an R with 2 dots in the extended part oft ASCII. _________________ Just use REFERENCEs |
|
Back to top |
|
 |
fjb_saper |
Posted: Sat Jan 18, 2014 3:00 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Set your display tool to display the bitstream in UTF=8 instead of whatever code page it uses as default...
Wait, you did not specify what your display tool was or even if it is capable of displaying UTF-8 !! Or for that matter display the stream in hex and verify the values...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Raya76 |
Posted: Sat Jan 18, 2014 10:39 pm Post subject: |
|
|
Newbie
Joined: 17 Jan 2014 Posts: 3
|
Thank you.
Its UTF-8, and here is the actual value and HEX.
Character : Ÿ
in hEX - 9f, value 159.
Any idea which CCSID, encoding value need to set?
Thanks |
|
Back to top |
|
 |
smdavies99 |
Posted: Sat Jan 18, 2014 11:23 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
The internet is a wonderful source of code pages and their character mappings. Start with all the most common ones ISO-8859-x. Wikipedia is a good place to start.
(where x goes from 1 to afaik, 20)
You may need to match a whole series of characters to get the right one.
For example
Is your data coming from Western Europe or Eastern Europe?
Do you expect to have to handle a Euro Character. It does matter
etc
etc _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
mqsiuser |
Posted: Sun Jan 19, 2014 12:06 pm Post subject: |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
Raya76 wrote: |
Its UTF-8, and here is the actual value and HEX.
Any idea which CCSID, encoding value need to set? |
I googled "CCSID UTF-8" and the result is:
Quote: |
The CCSID value for data in UTF-8 format is 1208 |
_________________ Just use REFERENCEs |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Jan 19, 2014 1:15 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
If this is what you are looking for
U+0210 Ȑ c8 90 LATIN CAPITAL LETTER R WITH DOUBLE GRAVE
U+0211 ȑ c8 91 LATIN SMALL LETTER R WITH DOUBLE GRAVE
Then this page does indicate that you are looking at something that is encoded in UTF-8
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=512
If it isn't then you will need to look at some other Character Set. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Jan 20, 2014 1:43 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
@Raya76: I think you might be missing an important fact.
Quote: |
At mq input node receiving blob msg and parsing the data using MRM parser. Data is fixed length 450bytes. |
UTF-8 is a multi-byte, variable-width encoding. Your input data format is a fixed-length format. So your data will contain somewhere between 112 ( 450/4 ) and 450 ( best case ) characters. And there may be a truncated multi-byte UTF-8 character in the final one or two bytes.
Quote: |
When I parse the msg using MRM msp no issue at all, when I put that msg to output Q I see just 449 bytes. |
Do you really mean 449 bytes. Or do you mean 449 characters. If you mean that the output field is short by one byte, then it could be because the final character was a multi-byte character, and it did not fit into the final byte of the output buffer. MRM has padding options for that kind of thing. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Raya76 |
Posted: Mon Jan 20, 2014 10:20 am Post subject: |
|
|
Newbie
Joined: 17 Jan 2014 Posts: 3
|
Here is exact flow:
Input with BLOB and using RCD parse the BLOB to MRM, there is no parsing error. When i debug i see '?" that Ÿ character. No issue at all but when I attach that inbound msg to XML msg which i create in the comp node.
That XML msg goes to WTX node for further validation (Each field of that inbound 450 character msg - copy book) get validated. There i see that msg is shortened by 1 charcter. That means total we see only 449 characters in the back up files.
In WMB i dont think its a issue as now i am parsing using 1208 and MQENC_NATIVE encoding.
If its a WMB issue it could have failed when i used RCD node for parsing error.
Ÿ , HEX - 9F, VALUE - 159.
I believe when i attach inbound msg (After converting to bitstream msg) to XML which could be shortened by 1 character (may be because of Ÿ and see that '?" in debugger).
Can't have access to create user trace
The only visible place for me user logs.
Any other suggestions please? |
|
Back to top |
|
 |
smdavies99 |
Posted: Mon Jan 20, 2014 11:27 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Hex 9F Decimal 159 in UTF-8 is a control character. Therefore it can't be the
LATIN CAPITAL LETTER Y WITH DIAERESIS
you claim it to be.
If you really are looking for the Y with Diaersis then in UTF-8 you need to be looking at
U+0178 Ÿ c5 b8 LATIN CAPITAL LETTER Y WITH DIAERESIS
This is very different to what you are seeing.
I am also uncertain about the 7F character value. From my memory I don't know of any Character set that actually defines a character to this value.
From the ISO-8859-1 spec (wikipedia and other places) I see.
ÿ 00FF 255
Then in ISO-8859-15 I see
Ÿ 0178 190
Again a very different hex value for the character you have mentioned.
you realy need to examine the incoming data in its raw form. You can then match all of the payload to a character set in order to make the translation into readable text make sense.
There is always the posibility that The CCSID of the message does not match the actual CCSID of the data. This has appeared here many times before. By checking the actual raw data you can determine this. IF this is the case your ONLY course of action is to go to the originators of the message and get them to fix their error. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jan 21, 2014 4:56 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
That XML msg goes to WTX node for further validation (Each field of that inbound 450 character msg - copy book) get validated. There i see that msg is shortened by 1 charcter. That means total we see only 449 characters in the back up files. |
You did not answer my question. I asked:
Quote: |
Do you really mean 449 bytes. Or do you mean 449 characters. |
The field is 450 bytes ( not characters ). If one character takes up two bytes then you will only get 449 characters. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Tibor |
Posted: Tue Jan 21, 2014 5:22 am Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
Raya76, it would be very important if you can precisely define the character, just like this: http://www.fileformat.info/info/unicode/char/0178/index.htm. I am not sure, it is your character or not, because your code 159 is effective only with your current codepage. Please send a result of the command chcp on Windows or locale on Linux / Unix. |
|
Back to top |
|
 |
smdavies99 |
Posted: Tue Jan 21, 2014 6:50 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Tibor wrote: |
Please send a result of the command chcp on Windows or locale on Linux / Unix. |
This may or may not give the correct answer. It all depends upon the CCSID that was used in the MQPUT operation.
for example , my desktop the CHCP output is 850 (Multilingual Latin 1). All the messages on my Queue Manager are in CCSID 1208 because that is how they are written and the CCSID of the queue manager. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
Tibor |
Posted: Tue Jan 21, 2014 7:04 am Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
smdavies99 wrote: |
It all depends upon the CCSID that was used in the MQPUT operation. |
Maybe yes. Or not.
It depends on whether the original post was about a character which was displayed or it was only generated by typing with an Alt+code. |
|
Back to top |
|
 |
|