ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Code Page issue with UTF-8

Post new topic  Reply to topic
 Code Page issue with UTF-8 « View previous topic :: View next topic » 
Author Message
mjain
PostPosted: Tue May 01, 2012 10:17 pm    Post subject: Code Page issue with UTF-8 Reply with quote

Novice

Joined: 01 May 2012
Posts: 15

Hi,
We have a message flow which recevies data from backend system 'Finacle' over TCP/IP. Earlier the data was sent in ASMO - 708 encoding by Finacle and now they claim to have changed it to UTF-8. This data contains Arabic characters. Below is the code which I used to cast the data earlier....

Code:
SET Environment.Variables.InputMsg = ASBITSTREAM(InputBody.BLOB ENCODING 546 CCSID 1089 OPTIONS FolderBitStream);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS CHARACTER CCSID 1089 ENCODING 546);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS BLOB CCSID 1208 ENCODING 546);


And now since the data is in UTF-8, I changed the code as below:
Code:
SET Environment.Variables.InputMsg = ASBITSTREAM(InputBody.BLOB ENCODING 546 CCSID 1208 OPTIONS FolderBitStream);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS CHARACTER CCSID 1208 ENCODING 546);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS BLOB CCSID 1208 ENCODING 546);


But I still get "Unconvertible Character". I am not sure if I am using the correct code page '546'. As far as I know, ccsid 1208 holds good for UTF-8. I will be really thankful for any help. I have already tried lot of code page & CCSID combinations in vain.Our MB is running on HP-AIX.
Back to top
View user's profile Send private message
Esa
PostPosted: Tue May 01, 2012 10:42 pm    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

Hi mjain,

it seems your aim is to convert the input into UTF-8 message, not transformation or other processing?

If you get UTF-8 in and need to put UTF-8 out, why do you have to cast it at all? Unless you need to change the encoding of numeric fields to 546.

The first line in you code samples seems to be of no effect.
Back to top
View user's profile Send private message
mjain
PostPosted: Tue May 01, 2012 11:40 pm    Post subject: Reply with quote

Novice

Joined: 01 May 2012
Posts: 15

Hi Esa,
Thanks for your reply. You r right. Since now when I am getting data in UTF-8 itself I do not need to convert any encoding. But I still need to cast it to character stream for further processing and I still get the same error while trying to cast the InputBody.Blob.

Code:
SET Environment.Variables.InputMsg = InputBody.BLOB;
SET Environment.Variables.TCPIPRs.Msg = CAST(InputBody.BLOB AS CHARACTER CCSID 1208);

Line no 2 still gives me error.I dont know why its not being cast even when the data I receive is in UTF-8. Also note that the data contains arabic characters.
Back to top
View user's profile Send private message
Esa
PostPosted: Tue May 01, 2012 11:57 pm    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

Your current first line is creating an unnecessary copy of the input message.

If you get an unconvertible character error, the input message contains a character that cannot be converted from UTF-8 to UCS-2. I think one of the inserts of the exception should tell which character.
Back to top
View user's profile Send private message
mjain
PostPosted: Wed May 02, 2012 12:06 am    Post subject: Reply with quote

Novice

Joined: 01 May 2012
Posts: 15

Thanks again Esa.
Fist line is required to create envionment variable used by loggging subflow...neways looking at error trace I think its 'd8' which is mentioned unconvertible. I would check with the backend team to confirm if they are sending a valid UTF-8 data. Still any suggestion from your end will be very helpful.
Back to top
View user's profile Send private message
kimbert
PostPosted: Wed May 02, 2012 1:35 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
I would check with the backend team to confirm if they are sending a valid UTF-8 data.
No need - you can check that yourself. Just look at the bytes of the BLOB and check whether they represent a valid UTF-8 stream.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Wed May 02, 2012 1:36 am    Post subject: Re: Code Page issue with UTF-8 Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

mjain wrote:
Code:
SET Environment.Variables.InputMsg = ASBITSTREAM(InputBody.BLOB ENCODING 546 CCSID 1208 OPTIONS FolderBitStream);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS CHARACTER CCSID 1208 ENCODING 546);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS BLOB CCSID 1208 ENCODING 546);

The first line is using ASBITSTREAM to convert a BLOB to a BLOB. That's not really necessary, since a BLOB is already a BLOB. The ENCODING, CCSID, and FolderBitStream OPTIONS are unused. The second line is interpreting the BLOB as a UTF-8 byte sequence, and converting it to CHARACTER; the ENCODING is unused. And the third line is converting CHARACTER back to a UTF-8 byte sequence; again, the ENCODING is unused. It could be simplified:
Code:
SET Environment.Variables.InputMsgBLOB = InputBody.BLOB;
SET Environment.Variables.InputMsgCHAR = CAST(InputBody.BLOB AS CHARACTER CCSID 1208);

By saving the original BLOB separately, it's not necessary to convert it from BLOB to CHARACTER and back to BLOB again.

mjain wrote:
But I still get "Unconvertible Character".

Please post the complete error message. (If it contains an excessively long byte sequence, please snip the irrelevant bytes or add line-breaks before posting, to make it readable.) The most likely cause is an ill-formed input message; it's not really UTF-8.

mjain wrote:
I am not sure if I am using the correct code page '546' ... Our MB is running on HP-AIX.

What's HP-AIX? '546' is the wrong Encoding value for UNIX systems, but as it's unused, it's not the cause of any "unconvertible character" error.

mjain wrote:
neways looking at error trace I think its 'd8' which is mentioned unconvertible

Examining the surrounding bytes would help. 'd8' could represent the ASMO-708 'ظ' (U+0638, "ARABIC LETTER ZAH"), the lead byte of a UTF-8 multi-byte character, part of a UTF-16 lead surrogate, or it could be something else. Without more information, it's hard to tell.
Back to top
View user's profile Send private message
kash3338
PostPosted: Wed May 02, 2012 7:49 am    Post subject: Reply with quote

Shaman

Joined: 08 Feb 2009
Posts: 709
Location: Chennai, India

Does'nt the Input tree have the CCSID info? You can get it from the properties folder, you can try this,

Code:

SET Environment.Variables.InputMsgBLOB = InputBody.BLOB;
SET Environment.Variables.InputMsgCHAR = CAST(InputBody.BLOB AS CHARACTER CCSID InputRoot.Properties.CCSID);
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Wed May 02, 2012 8:58 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

kash3338 wrote:
Doesn't the Input tree have the CCSID info?

Yes, it should, and it's usually a good practice to get the input ccsid for the input message from the input properties.

For the TCP input nodes, it's the input node itself, not the incoming message, that provides this CCSID info; one way or another, the CCSID is still hardcoded in the message flow. If it doesn't match the input data, then it makes little difference where it's hardcoded.
Back to top
View user's profile Send private message
mjain
PostPosted: Sun May 06, 2012 4:55 am    Post subject: Reply with quote

Novice

Joined: 01 May 2012
Posts: 15

Thanks for your comment Rekarm,
I thought if I need to encode integers as 546 I'll need to use the asbitstream (First line of my code).
You are right, its indeed a arabic character which is two bytes long but the backend host, 'Finacle' , was trying to fit it in one byte. I asked he backend to change the encoding to UTF-16 which fixed the problem. I am using CCSID 1200 to cast the response to char stream.
Thanks for all suggestions in the post....it helped a lot.

Also...just a note....when I tried using the CCSID from InputRoot.Properties, which was 1051, it parsed the message succesfully but changed all the arabic chanracters to unreadable.
Back to top
View user's profile Send private message
kash3338
PostPosted: Sun May 06, 2012 8:17 am    Post subject: Reply with quote

Shaman

Joined: 08 Feb 2009
Posts: 709
Location: Chennai, India

mjain wrote:
Also...just a note....when I tried using the CCSID from InputRoot.Properties, which was 1051, it parsed the message succesfully but changed all the arabic chanracters to unreadable.


What is your Input node? If its a TCP node, as mentioned by rekarm01 it should be set by the node.
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Sun May 06, 2012 3:27 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

mjain wrote:
You are right, its indeed a arabic character which is two bytes long but the backend host, 'Finacle', was trying to fit it in one byte.

Whether it's one byte or two bytes depends on the ccsid that the backend host uses to convert it from character to byte.

mjain wrote:
Also...just a note....when I tried using the CCSID from InputRoot.Properties, which was 1051, it parsed the message succesfully but changed all the arabic chanracters to unreadable.

The message flow needs to use the same ccsid to read the message that the backend host used to write it.
Back to top
View user's profile Send private message
mjain
PostPosted: Mon May 07, 2012 1:39 am    Post subject: Reply with quote

Novice

Joined: 01 May 2012
Posts: 15

Quote:
What is your Input node? If its a TCP node, as mentioned by rekarm01 it should be set by the node


Yes, its TCP node. I also tried using the CCSID set by the node, but that changes the arabic character to unreadable format. Its 1051 what is set by the node but I am not sure why it does not work.

Quote:
Whether it's one byte or two bytes depends on the ccsid that the backend host uses to convert it from character to byte.


Backend is not aware of the CCSID they just call it UTF-8, I believe its 1200 as it seems to have solved the poblem.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Mon May 07, 2012 7:54 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Seems to me as well that your back end is setting a CCSID on the content that is different from the CCSID information (or lack thereof) that they put on the message. THIS IS NOT "GOOD" PRACTICE.

The back end needs to set the CCSID on the message with the CCSID the content they write into the message. Tell them to fix that...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Tue May 08, 2012 1:45 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

mjain wrote:
Backend is not aware of the CCSID they just call it UTF-8

Whatever they call it, the message flow must use the same ccsid/charset/character encoding to read the data that the backend used to write it:
  • ccsid=1051 (HP Roman-8)
  • ccsid=1089 (ASMO 708 / ISO 8859-6)
  • ccsid=1200 (UTF-16)
  • ccsid=1208 (UTF-8)
Pick one that both sides can agree on.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Code Page issue with UTF-8
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.