MQSeries.net :: View topic - Extended ASCII char

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Extended ASCII char - data conversion issue

Extended ASCII char - data conversion issue

« View previous topic :: View next topic »

Author

Message

Raya76

Posted: Fri Jan 17, 2014 8:45 pm Post subject: Extended ASCII char - data conversion issue

Newbie

Joined: 17 Jan 2014
Posts: 3

Hello.
Need lil help with extended ASCII character ( example: R with 2 dots on top of it)
At mq input node receiving blob msg and parsing the data using MRM parser. Data is fixed length 450bytes. One of the text field receiving that kind of data.

When I parse the msg using MRM msp no issue at all, when I put that msg to output Q I see just 449 bytes.

When I debug that msg that character I see as " ? ".

When I converted that input data to bitstream I used ccsid 1208 and encoding MQENC_NATIVE.

Is there a way to handle that input msg extended ASCII char( Hex9f).

Thanks

mqsiuser

Posted: Sat Jan 18, 2014 12:07 am Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

You really need to find out the proper code page, that the message uses. Open the message in Notepad++ and in the menue Notepad++ will display the code page it *thinks* it is (and it applies that to display the characters). Better though: Ask and find out at the source.

"?" in the msgs means you are using a wrong codepage. It means there is no character for the bits/bytes for that code page.

And there is much more than 7bit ASCII and extendend (8bit) ASCII.

I didn't find an R with 2 dots in the extended part oft ASCII.
_________________
Just use REFERENCEs

fjb_saper

Posted: Sat Jan 18, 2014 3:00 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20771
Location: LI,NY

Set your display tool to display the bitstream in UTF=8 instead of whatever code page it uses as default...

Wait, you did not specify what your display tool was or even if it is capable of displaying UTF-8 !! Or for that matter display the stream in hex and verify the values...

_________________
MQ & Broker admin

Raya76

Posted: Sat Jan 18, 2014 10:39 pm Post subject:

Newbie

Joined: 17 Jan 2014
Posts: 3

Thank you.

Its UTF-8, and here is the actual value and HEX.

Character : Å¸

in hEX - 9f, value 159.

Any idea which CCSID, encoding value need to set?

Thanks

smdavies99

Posted: Sat Jan 18, 2014 11:23 pm Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

The internet is a wonderful source of code pages and their character mappings. Start with all the most common ones ISO-8859-x. Wikipedia is a good place to start.
(where x goes from 1 to afaik, 20)

You may need to match a whole series of characters to get the right one.
For example

Is your data coming from Western Europe or Eastern Europe?
Do you expect to have to handle a Euro Character. It does matter

etc
etc
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

mqsiuser

Posted: Sun Jan 19, 2014 12:06 pm Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

Raya76 wrote:

Its UTF-8, and here is the actual value and HEX.

Any idea which CCSID, encoding value need to set?

I googled "CCSID UTF-8" and the result is:

Quote:

The CCSID value for data in UTF-8 format is 1208

_________________
Just use REFERENCEs

smdavies99

Posted: Sun Jan 19, 2014 1:15 pm Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

If this is what you are looking for
U+0210 Ȑ c8 90 LATIN CAPITAL LETTER R WITH DOUBLE GRAVE
U+0211 ȑ c8 91 LATIN SMALL LETTER R WITH DOUBLE GRAVE

Then this page does indicate that you are looking at something that is encoded in UTF-8
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=512

If it isn't then you will need to look at some other Character Set.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

kimbert

Posted: Mon Jan 20, 2014 1:43 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

@Raya76: I think you might be missing an important fact.

Quote:

At mq input node receiving blob msg and parsing the data using MRM parser. Data is fixed length 450bytes.

UTF-8 is a multi-byte, variable-width encoding. Your input data format is a fixed-length format. So your data will contain somewhere between 112 ( 450/4 ) and 450 ( best case ) characters. And there may be a truncated multi-byte UTF-8 character in the final one or two bytes.

Quote:

When I parse the msg using MRM msp no issue at all, when I put that msg to output Q I see just 449 bytes.

Do you really mean 449 bytes. Or do you mean 449 characters. If you mean that the output field is short by one byte, then it could be because the final character was a multi-byte character, and it did not fit into the final byte of the output buffer. MRM has padding options for that kind of thing.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

Raya76

Posted: Mon Jan 20, 2014 10:20 am Post subject:

Newbie

Joined: 17 Jan 2014
Posts: 3

Here is exact flow:

Input with BLOB and using RCD parse the BLOB to MRM, there is no parsing error. When i debug i see '?" that Å¸ character. No issue at all but when I attach that inbound msg to XML msg which i create in the comp node.

That XML msg goes to WTX node for further validation (Each field of that inbound 450 character msg - copy book) get validated. There i see that msg is shortened by 1 charcter. That means total we see only 449 characters in the back up files.

In WMB i dont think its a issue as now i am parsing using 1208 and MQENC_NATIVE encoding.
If its a WMB issue it could have failed when i used RCD node for parsing error.

Å¸ , HEX - 9F, VALUE - 159.

I believe when i attach inbound msg (After converting to bitstream msg) to XML which could be shortened by 1 character (may be because of Å¸ and see that '?" in debugger).

Can't have access to create user trace
The only visible place for me user logs.

Any other suggestions please?

smdavies99

Posted: Mon Jan 20, 2014 11:27 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Hex 9F Decimal 159 in UTF-8 is a control character. Therefore it can't be the

LATIN CAPITAL LETTER Y WITH DIAERESIS
you claim it to be.

If you really are looking for the Y with Diaersis then in UTF-8 you need to be looking at

U+0178 Å¸ c5 b8 LATIN CAPITAL LETTER Y WITH DIAERESIS

This is very different to what you are seeing.
I am also uncertain about the 7F character value. From my memory I don't know of any Character set that actually defines a character to this value.

From the ISO-8859-1 spec (wikipedia and other places) I see.

Ã¿ 00FF 255

Then in ISO-8859-15 I see
Å¸ 0178 190

Again a very different hex value for the character you have mentioned.

you realy need to examine the incoming data in its raw form. You can then match all of the payload to a character set in order to make the translation into readable text make sense.

There is always the posibility that The CCSID of the message does not match the actual CCSID of the data. This has appeared here many times before. By checking the actual raw data you can determine this. IF this is the case your ONLY course of action is to go to the originators of the message and get them to fix their error.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

kimbert

Posted: Tue Jan 21, 2014 4:56 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

That XML msg goes to WTX node for further validation (Each field of that inbound 450 character msg - copy book) get validated. There i see that msg is shortened by 1 charcter. That means total we see only 449 characters in the back up files.

You did not answer my question. I asked:

Quote:

Do you really mean 449 bytes. Or do you mean 449 characters.

The field is 450 bytes ( not characters ). If one character takes up two bytes then you will only get 449 characters.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

Tibor

Posted: Tue Jan 21, 2014 5:22 am Post subject:

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

Raya76, it would be very important if you can precisely define the character, just like this: http://www.fileformat.info/info/unicode/char/0178/index.htm. I am not sure, it is your character or not, because your code 159 is effective only with your current codepage. Please send a result of the command chcp on Windows or locale on Linux / Unix.

smdavies99

Posted: Tue Jan 21, 2014 6:50 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Tibor wrote:

Please send a result of the command chcp on Windows or locale on Linux / Unix.

This may or may not give the correct answer. It all depends upon the CCSID that was used in the MQPUT operation.

for example , my desktop the CHCP output is 850 (Multilingual Latin 1). All the messages on my Queue Manager are in CCSID 1208 because that is how they are written and the CCSID of the queue manager.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

Tibor

Posted: Tue Jan 21, 2014 7:04 am Post subject:

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

smdavies99 wrote:

It all depends upon the CCSID that was used in the MQPUT operation.

Maybe yes. Or not.

It depends on whether the original post was about a character which was displayed or it was only generated by typing with an Alt+code.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Extended ASCII char - data conversion issue

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP