ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » MultiByte message parsing error

Post new topic  Reply to topic
 MultiByte message parsing error « View previous topic :: View next topic » 
Author Message
satya2481
PostPosted: Tue Jan 20, 2009 9:38 am    Post subject: MultiByte message parsing error Reply with quote

Disciple

Joined: 26 Apr 2007
Posts: 170
Location: Bengaluru

Hi All,

I am working on WMB Toolkit V6.1 The source system sends message in a flat file and it needs to be converted to XML message.
I have created the message definition from the Cobol copy text and this is the incoming message format.

When the message is sent from the source system with plane english characters broker able to parse the message against the message definition. But when the message contains some special characters like ä Ö etc broker fails to parse the message.

Source system some times sends multibyte messages.

I have tried setting 1208, 819 ccsid in the MQInput node but none of them works. Message length is treated as more than the one specified in the Cobol copy book

An example value for such charcters is c384 which is equal to Ä...

For more characters of this type check the link

http://www.fileformat.info/info/charset/UTF-8/encode.htm

I would like to know how to handle such characters...

Thank You
Satya
Back to top
View user's profile Send private message Send e-mail
fjb_saper
PostPosted: Tue Jan 20, 2009 3:17 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

There is only one way to handle such a problem:
First translate your message to CCSID 500 or 37 or whatever EBCDIC CCSID you normally use with the COBOL copy books.

A Cobol copy book is a positional parser. So if you use 2bytes instead of 1 (different ccsid) your parsing is off by 1 byte, etc....

We have exactly the same problem where we retrieve a message from a DB.
So before parsing we translate it to CCSID 500. Works like a charm.

Enjoy
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Wed Jan 21, 2009 1:26 am    Post subject: Re: MultiByte message parsing error Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

satya2481 wrote:
I am working on WMB Toolkit V6.1 The source system sends message in a flat file and it needs to be converted to XML message. ...

When the message is sent from the source system with plain english characters broker able to parse the message against the message definition. But when the message contains some special characters like ä Ö etc broker fails to parse the message.

Source system some times sends multibyte messages. ...

An example value for such charcters is c384 which is equal to Ä ...

It looks like the source system is creating the message using UTF-8 (CCSID=1208). The first thing you should check is that the input header's CCSID matches the input message, before it gets to the message flow.

satya2481 wrote:
I have created the message definition from the Cobol copy text and this is the incoming message format. ...

I have tried setting 1208, 819 ccsid in the MQInput node but none of them works. Message length is treated as more than the one specified in the Cobol copy book

As fjb_saper suggested, you can convert the source message to a single byte character set, (such as "ASCII" 819 or EBCDIC 37 or 500), as long as any potential character in the source message also exists in the target character set you're trying to convert to.

Don't forget to check the "Convert" box in the MQInput node.

But, if the source message contains characters that don't exist in the target character set, you can't convert those characters correctly.

Another approach is to modify the CWF properties for each string element in the message set, changing "Length Units" from "Bytes" to "Characters". Then the parser will count characters properly, regardless of how many bytes they take up.
Back to top
View user's profile Send private message
satya2481
PostPosted: Thu Jan 22, 2009 9:13 pm    Post subject: Reply with quote

Disciple

Joined: 26 Apr 2007
Posts: 170
Location: Bengaluru

Hi,
Thank you for the reply.

Quote:
First translate your message to CCSID 500 or 37 or whatever EBCDIC CCSID you normally use with the COBOL copy books.


I have tried to translate the incoming message to CCSID 500, it didnt worked I am still getting some Junk characters.

rekarm01 Quote
Quote:
The first thing you should check is that the input header's CCSID matches the input message, before it gets to the message flow.


I have checked the CCSID of the MQMD header it is coming as 819 before it gets into the message flow. I have tried setting this to 1208 manually using Rfhutil and then send to the message flow even then the parsed unable to understand the characters and fails with "Not enough data in bitstream".

Quote:
Don't forget to check the "Convert" box in the MQInput node.


Yes, I had checked this option while trying to convert the incoming message to different character set.

Quote:
Another approach is to modify the CWF properties for each string element in the message set, changing "Length Units" from "Bytes" to "Characters". Then the parser will count characters properly, regardless of how many bytes they take up.


I have checked this option and in my message set the length units is "Characters" and not "Bytes".

I would like to know whether Broker parser is able to understand double byte characters. Some of the characters sent from the sourcer application contains double byte characters. For example the hexadecimal value "C3A4" is coming for "ä". But the hexadecimal equivalent for this character in single byte is "e4".
In the message flow I am performing below steps.
>Read the message from queue - BLOB parser
>Parse the message against the message set (TDS) created using the cobol copy text. Used RCD node for this
>Mapping -- From TDS to XMLNSC
>Send the message to MQOutput

When the message is read from the queue and tried to parse against the message set it is not able to predict double byte characters and the length varies and hence it fails to parse.

I thought to handle this double byte characters to single byte inside the broker code. But source application side they can send multi byte characters (Upto 4 bytes as informed to me). So it is very difficult to map all these characters.

Has anyone faced such issues and how to solve this.

Thank You
Satya
Back to top
View user's profile Send private message Send e-mail
kimbert
PostPosted: Fri Jan 23, 2009 1:54 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
I would like to know whether Broker parser is able to understand double byte characters
Yes it can. And multibyte characters too. But the broker need to know the correct code page to use when parsing. In almost all cases, the final header tells the broker the CCSID of the message body.

Quote:
But source application side they can send multi byte characters (Upto 4 bytes as informed to me)
You should not need to set the code page on the MQInput node. Go to the source application team and tell them to set MQMD.CodedCharSetId correctly ( assuming that MQMD is the only header ). From their description, it sounds like they are sending UTF8 ( code page 1208 ).
Once the code page is correct, everything should 'just work'.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Jan 23, 2009 4:09 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Quote:
Another approach is to modify the CWF properties for each string element in the message set, changing "Length Units" from "Bytes" to "Characters". Then the parser will count characters properly, regardless of how many bytes they take up.

Can we please get this as a check box on the import wizards (and in particular the Cobol import wizard) so as not to have to do it for each string element?

Thanks
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
kimbert
PostPosted: Fri Jan 23, 2009 5:41 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
Can we please get this as a check box on the import wizards
Sounds useful. You should raise a requirement with IBM through the normal channels. I know, it's a long, arduous process...

satya2481: This will not necessarily help you. If the input message is telling lies about its code page, nothing will help until that is fixed.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Fri Jan 23, 2009 2:54 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

satya2481 wrote:
For example the hexadecimal value "c3a4" is coming for "ä". But the hexadecimal equivalent for this character in single byte is "e4".

It's the message header's CCSID that defines the "hexadecimal equivalent" for any character. For example,
  • when CCSID=500, the hexadecimal equivalent of 'ä' is X'43'
  • when CCSID=819, the hexadecimal equivalent of 'ä' is X'e4'
  • when CCSID=1208, the hexadecimal equivalent of 'ä' is X'c3a4'
  • when CCSID=819, the hexadecimal equivalent of 'ä' is X'c3a4'

As kimbert pointed out, if the source application is lying about the source CCSID, there's very little the broker can do about it.
Back to top
View user's profile Send private message
satya2481
PostPosted: Mon Jan 26, 2009 9:32 am    Post subject: Reply with quote

Disciple

Joined: 26 Apr 2007
Posts: 170
Location: Bengaluru

Hi All,
Thank you very much for all of your replies...

Well the problem resolved

Mainly the message which was sent from the source application was not sent to Broker directly. It was sent via customer specific some application. Initially when the message was sent from source system it was a multi byte message with CCSID - 819.

We have checked on this and tried to setup a new customer application whihc sets 1208 in the MQMD.CCSID. And this resolved my issue....

Now WMB parse able to parse the mutli byte characters. When I save this message in from the Rfhutil in a file it is saved as UTF-8. I can see this code page when I open the saved file in EdiPlus.

But I dont know some times the file opens as ANSI (not the same message but some other message with same CCSID i.e 1208).

Thank You
Satya
_________________
IBM Certified Solution Developer WebSphere Message Broker V6.0
IBM Certified System Administrator WebSphere MQ V6.0
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » MultiByte message parsing error
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.