Author |
Message
|
mjain |
Posted: Tue May 01, 2012 10:17 pm Post subject: Code Page issue with UTF-8 |
|
|
 Novice
Joined: 01 May 2012 Posts: 15
|
Hi,
We have a message flow which recevies data from backend system 'Finacle' over TCP/IP. Earlier the data was sent in ASMO - 708 encoding by Finacle and now they claim to have changed it to UTF-8. This data contains Arabic characters. Below is the code which I used to cast the data earlier....
Code: |
SET Environment.Variables.InputMsg = ASBITSTREAM(InputBody.BLOB ENCODING 546 CCSID 1089 OPTIONS FolderBitStream);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS CHARACTER CCSID 1089 ENCODING 546);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS BLOB CCSID 1208 ENCODING 546); |
And now since the data is in UTF-8, I changed the code as below:
Code: |
SET Environment.Variables.InputMsg = ASBITSTREAM(InputBody.BLOB ENCODING 546 CCSID 1208 OPTIONS FolderBitStream);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS CHARACTER CCSID 1208 ENCODING 546);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS BLOB CCSID 1208 ENCODING 546); |
But I still get "Unconvertible Character". I am not sure if I am using the correct code page '546'. As far as I know, ccsid 1208 holds good for UTF-8. I will be really thankful for any help. I have already tried lot of code page & CCSID combinations in vain.Our MB is running on HP-AIX. |
|
Back to top |
|
 |
Esa |
Posted: Tue May 01, 2012 10:42 pm Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
Hi mjain,
it seems your aim is to convert the input into UTF-8 message, not transformation or other processing?
If you get UTF-8 in and need to put UTF-8 out, why do you have to cast it at all? Unless you need to change the encoding of numeric fields to 546.
The first line in you code samples seems to be of no effect. |
|
Back to top |
|
 |
mjain |
Posted: Tue May 01, 2012 11:40 pm Post subject: |
|
|
 Novice
Joined: 01 May 2012 Posts: 15
|
Hi Esa,
Thanks for your reply. You r right. Since now when I am getting data in UTF-8 itself I do not need to convert any encoding. But I still need to cast it to character stream for further processing and I still get the same error while trying to cast the InputBody.Blob.
Code: |
SET Environment.Variables.InputMsg = InputBody.BLOB;
SET Environment.Variables.TCPIPRs.Msg = CAST(InputBody.BLOB AS CHARACTER CCSID 1208);
|
Line no 2 still gives me error.I dont know why its not being cast even when the data I receive is in UTF-8. Also note that the data contains arabic characters. |
|
Back to top |
|
 |
Esa |
Posted: Tue May 01, 2012 11:57 pm Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
Your current first line is creating an unnecessary copy of the input message.
If you get an unconvertible character error, the input message contains a character that cannot be converted from UTF-8 to UCS-2. I think one of the inserts of the exception should tell which character. |
|
Back to top |
|
 |
mjain |
Posted: Wed May 02, 2012 12:06 am Post subject: |
|
|
 Novice
Joined: 01 May 2012 Posts: 15
|
Thanks again Esa.
Fist line is required to create envionment variable used by loggging subflow...neways looking at error trace I think its 'd8' which is mentioned unconvertible. I would check with the backend team to confirm if they are sending a valid UTF-8 data. Still any suggestion from your end will be very helpful. |
|
Back to top |
|
 |
kimbert |
Posted: Wed May 02, 2012 1:35 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I would check with the backend team to confirm if they are sending a valid UTF-8 data. |
No need - you can check that yourself. Just look at the bytes of the BLOB and check whether they represent a valid UTF-8 stream. |
|
Back to top |
|
 |
rekarm01 |
Posted: Wed May 02, 2012 1:36 am Post subject: Re: Code Page issue with UTF-8 |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
mjain wrote: |
Code: |
SET Environment.Variables.InputMsg = ASBITSTREAM(InputBody.BLOB ENCODING 546 CCSID 1208 OPTIONS FolderBitStream);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS CHARACTER CCSID 1208 ENCODING 546);
SET Environment.Variables.InputMsg = CAST(Environment.Variables.InputMsg AS BLOB CCSID 1208 ENCODING 546); |
|
The first line is using ASBITSTREAM to convert a BLOB to a BLOB. That's not really necessary, since a BLOB is already a BLOB. The ENCODING, CCSID, and FolderBitStream OPTIONS are unused. The second line is interpreting the BLOB as a UTF-8 byte sequence, and converting it to CHARACTER; the ENCODING is unused. And the third line is converting CHARACTER back to a UTF-8 byte sequence; again, the ENCODING is unused. It could be simplified:
Code: |
SET Environment.Variables.InputMsgBLOB = InputBody.BLOB;
SET Environment.Variables.InputMsgCHAR = CAST(InputBody.BLOB AS CHARACTER CCSID 1208); |
By saving the original BLOB separately, it's not necessary to convert it from BLOB to CHARACTER and back to BLOB again.
mjain wrote: |
But I still get "Unconvertible Character". |
Please post the complete error message. (If it contains an excessively long byte sequence, please snip the irrelevant bytes or add line-breaks before posting, to make it readable.) The most likely cause is an ill-formed input message; it's not really UTF-8.
mjain wrote: |
I am not sure if I am using the correct code page '546' ... Our MB is running on HP-AIX. |
What's HP-AIX? '546' is the wrong Encoding value for UNIX systems, but as it's unused, it's not the cause of any "unconvertible character" error.
mjain wrote: |
neways looking at error trace I think its 'd8' which is mentioned unconvertible |
Examining the surrounding bytes would help. 'd8' could represent the ASMO-708 'ظ' (U+0638, "ARABIC LETTER ZAH"), the lead byte of a UTF-8 multi-byte character, part of a UTF-16 lead surrogate, or it could be something else. Without more information, it's hard to tell. |
|
Back to top |
|
 |
kash3338 |
Posted: Wed May 02, 2012 7:49 am Post subject: |
|
|
Shaman
Joined: 08 Feb 2009 Posts: 709 Location: Chennai, India
|
Does'nt the Input tree have the CCSID info? You can get it from the properties folder, you can try this,
Code: |
SET Environment.Variables.InputMsgBLOB = InputBody.BLOB;
SET Environment.Variables.InputMsgCHAR = CAST(InputBody.BLOB AS CHARACTER CCSID InputRoot.Properties.CCSID);
|
|
|
Back to top |
|
 |
rekarm01 |
Posted: Wed May 02, 2012 8:58 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
kash3338 wrote: |
Doesn't the Input tree have the CCSID info? |
Yes, it should, and it's usually a good practice to get the input ccsid for the input message from the input properties.
For the TCP input nodes, it's the input node itself, not the incoming message, that provides this CCSID info; one way or another, the CCSID is still hardcoded in the message flow. If it doesn't match the input data, then it makes little difference where it's hardcoded. |
|
Back to top |
|
 |
mjain |
Posted: Sun May 06, 2012 4:55 am Post subject: |
|
|
 Novice
Joined: 01 May 2012 Posts: 15
|
Thanks for your comment Rekarm,
I thought if I need to encode integers as 546 I'll need to use the asbitstream (First line of my code).
You are right, its indeed a arabic character which is two bytes long but the backend host, 'Finacle' , was trying to fit it in one byte. I asked he backend to change the encoding to UTF-16 which fixed the problem. I am using CCSID 1200 to cast the response to char stream.
Thanks for all suggestions in the post....it helped a lot.
Also...just a note....when I tried using the CCSID from InputRoot.Properties, which was 1051, it parsed the message succesfully but changed all the arabic chanracters to unreadable. |
|
Back to top |
|
 |
kash3338 |
Posted: Sun May 06, 2012 8:17 am Post subject: |
|
|
Shaman
Joined: 08 Feb 2009 Posts: 709 Location: Chennai, India
|
mjain wrote: |
Also...just a note....when I tried using the CCSID from InputRoot.Properties, which was 1051, it parsed the message succesfully but changed all the arabic chanracters to unreadable. |
What is your Input node? If its a TCP node, as mentioned by rekarm01 it should be set by the node. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun May 06, 2012 3:27 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
mjain wrote: |
You are right, its indeed a arabic character which is two bytes long but the backend host, 'Finacle', was trying to fit it in one byte. |
Whether it's one byte or two bytes depends on the ccsid that the backend host uses to convert it from character to byte.
mjain wrote: |
Also...just a note....when I tried using the CCSID from InputRoot.Properties, which was 1051, it parsed the message succesfully but changed all the arabic chanracters to unreadable. |
The message flow needs to use the same ccsid to read the message that the backend host used to write it. |
|
Back to top |
|
 |
mjain |
Posted: Mon May 07, 2012 1:39 am Post subject: |
|
|
 Novice
Joined: 01 May 2012 Posts: 15
|
Quote: |
What is your Input node? If its a TCP node, as mentioned by rekarm01 it should be set by the node |
Yes, its TCP node. I also tried using the CCSID set by the node, but that changes the arabic character to unreadable format. Its 1051 what is set by the node but I am not sure why it does not work.
Quote: |
Whether it's one byte or two bytes depends on the ccsid that the backend host uses to convert it from character to byte. |
Backend is not aware of the CCSID they just call it UTF-8, I believe its 1200 as it seems to have solved the poblem. |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon May 07, 2012 7:54 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Seems to me as well that your back end is setting a CCSID on the content that is different from the CCSID information (or lack thereof) that they put on the message. THIS IS NOT "GOOD" PRACTICE.
The back end needs to set the CCSID on the message with the CCSID the content they write into the message. Tell them to fix that...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue May 08, 2012 1:45 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
mjain wrote: |
Backend is not aware of the CCSID they just call it UTF-8 |
Whatever they call it, the message flow must use the same ccsid/charset/character encoding to read the data that the backend used to write it:- ccsid=1051 (HP Roman-8)
- ccsid=1089 (ASMO 708 / ISO 8859-6)
- ccsid=1200 (UTF-16)
- ccsid=1208 (UTF-8)
Pick one that both sides can agree on. |
|
Back to top |
|
 |
|