MQSeries.net :: View topic - Japanese characters problem with fixed length message set

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Japanese characters problem with fixed length message set

Goto page 1, 2, 3 Next

Japanese characters problem with fixed length message set

« View previous topic :: View next topic »

Author

Message

er_pankajgupta84

Posted: Tue Feb 09, 2010 7:26 am Post subject: Japanese characters problem with fixed length message set

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

I have a fixed length message set which is doing its job perfectly when it receives ASCII data.

Problem comes when we send Japanese data to this message set. Many Japanese characters takes 2 -4 bytes and i think that is causing an issue.

My QM CCSID is 1208 and source (Mainframe) is specifying the CCSID on the message as well. When Mainframe converts the data to 819 and send it to Broker then Message set is able to parse the message but produces junk characters for some Japanese characters. Which is quite possible as 819 does not have complete representation of Japanese character set.

If Mainframe send data after converting it to 1208 then Message is failing at parsing due to length constraint. CCSID on mainframe compiler is 37.

Can someone through some light on it.

Vitor

Posted: Tue Feb 09, 2010 7:35 am Post subject: Re: Japanese characters problem with fixed length message se

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

er_pankajgupta84 wrote:

Can someone through some light on it.

You said it yourself - Japanese characters can't be represented in code page 819, or any single byte code page because they're 2-4 bytes in length.

You need Unicode.
_________________
Honesty is the best policy.
Insanity is the best defence.

er_pankajgupta84

Posted: Tue Feb 09, 2010 8:01 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

That's what the problem is .. if i use any other code set then the number of bytes increases and message is failing because of length constraint.

Its a fixed length message set. Is there any setting in the message set that i can do or any other way in which i can accommodate this variable length fields in fixed length.

I am sure there has to be a way as otherwise we cannot use fixed length message set for japanese and other asian languages.

Vitor

Posted: Tue Feb 09, 2010 8:18 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

er_pankajgupta84 wrote:

Its a fixed length message set. Is there any setting in the message set that i can do or any other way in which i can accommodate this variable length fields in fixed length.

Use Unicode.

_________________
Honesty is the best policy.
Insanity is the best defence.

mqjeff

Posted: Tue Feb 09, 2010 8:21 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Are you counting lengths in bytes, or characters? What does the message definition and message set think you are counting them in?

er_pankajgupta84

Posted: Tue Feb 09, 2010 9:13 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

I don't see any difference when i specify length in character or Bytes..
What i feel if you specify characters then it should not matter how many bytes you send for one character...message set should be able to parse it.

But that's not happening.

mqjeff

Posted: Tue Feb 09, 2010 9:26 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If you specify your lengths are in characters, then sending multi-byte unicode characters should not cause a fixed-length model to fail.

There are some bugs in this area that I've seen. Your best bet is to ensure that you are at the *most* recent FP of your version of Broker (6.1.0.5 or 7.0.0.0 or 6.0.0.10), and then take a user trace to confirm that the issue is not with your model or your code somehow.

Then you could open a PMR.

Vitor

Posted: Tue Feb 09, 2010 9:27 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

er_pankajgupta84 wrote:

I don't see any difference when i specify length in character or Bytes

You should. And this might explain why you keep ignoring my advice.

er_pankajgupta84 wrote:

What i feel if you specify characters then it should not matter how many bytes you send for one character...message set should be able to parse it.

No, it shouldn't. No matter how you feel on the subject.

er_pankajgupta84 wrote:

But that's not happening.

No, it won't.

I repeat the advice I've given you twice before.
_________________
Honesty is the best policy.
Insanity is the best defence.

mqjeff

Posted: Tue Feb 09, 2010 9:30 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Right, sorry, that's the other thing.

Send this data as Unicode. Don't Don't DON'T try to mix US-ASCII and Japanese characters. Don't try to convert Japanese characters into EBCDIC. Don't try to convert them to code page 819.

make it all Unicode, and THEN confirm your fp levels and take a user trace.

er_pankajgupta84

Posted: Tue Feb 09, 2010 11:44 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

Data is coming from mainframe. CCSID on mainframe is EBSDIC - 37. So whatever that comes from mainframe would be in EBSDIC. Their is a mainframe program that can convert this data from any code page to any.
That program will send us the Unicode data. But the problem is if we are using unicode its length increases and it failed at the parser for length validation.

Vitor

Posted: Tue Feb 09, 2010 11:53 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

er_pankajgupta84 wrote:

That program will send us the Unicode data. But the problem is if we are using unicode its length increases and it failed at the parser for length validation.

Then we are faced with these possibilities:

- There's a problem with the message set. Unlikely as it works with single byte character sets.
- The conversion the mainframe is doing into Unicode is incorrect, or it's failing to correctly identify the message as being in Unicode (by correctly setting the code page)
- There's a bug in the message set parser and you should raise a PMR.

You said initially that the mainframe was converting into 819 rather than Unicode. Have you since tried it with Unicode and got the same result?
_________________
Honesty is the best policy.
Insanity is the best defence.

kimbert

Posted: Tue Feb 09, 2010 12:56 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

I think this is what is happening ( or something like it ).
- Your message set declares a string field which is (say) 20 bytes long
- Your message set also has a length constraint ( actually an xs:length facet) which checks that the field is 20 characters long.
- If you use a single-byte encoding like 819, then one byte=one character. So you always get exactly 20 characters. So the length constraint is satisfied.
- If you use code page 1208 (UTF-

then any non-ASCII characters will take up 2 or 3 bytes. So the number of characters in the parsed string will be <20. So the length constraint will be broken and you will get a validation error.

Quote:

EBCDIC ( code page 37 ) cannot represent any Japanese characters. Did you forget to tell us something? Or is the code page conversion utility getting its shoelaces tied together?

Regardless of the answer to that question, you cannot have a variable-width code page AND a fixed-width field AND a length constraint on the field.

er_pankajgupta84

Posted: Tue Feb 09, 2010 7:21 pm Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

Quote:

If you use a single-byte encoding like 819, then one byte=one character. So you always get exactly 20 characters. So the length constraint is satisfied.
- If you use code page 1208 (UTF- then any non-ASCII characters will take up 2 or 3 bytes. So the number of characters in the parsed string will be <20.

I completely agree with this statement. But even when i use code page 1208 then also the number of bytes are 20 even though characters are less than 20. And when i specified "Length Unit" of the field as "Bytes" then broker should look for number of bytes while validating length.

Quote:

Your message set also has a length constraint ( actually an xs:length facet) which checks that the field is 20 characters long.

What does this mean. I only specified the length on the field itself. Its a CSV message set. Along with that I do have a MAX length constraint which defines it to be 20 but there is no MIN length constraint.

Is there any other thing i need to check in message set. It might be possible even though the Length Unit of the field is set to "Bytes" but still some other setting might be forcing it to validate 20 characters instead of 20 bytes.

er_pankajgupta84

Posted: Tue Feb 09, 2010 7:34 pm Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

I think i missed this stuff.

My message set definition consists of RECORDS which further have fields. Each RECORD is fixed length and I have specified the length of each field in the field itself.

I have used "Use Pattern" as Data Element Separator for each RECORD. As each RECORD is a fixed length record so in pattern I have given the number of characters to be read.

FOR Example: First record would be [0-9]{5}00010.{220}

I think this is forcing broker to fail for length validation even if I specify "Bytes" as length Unit on the fields.

Now, The question is:

I cannot change the Data Element Separator of my records as there is no other available. How can I make it work? How can I give Bytes instead of Characters in Patterns.

I will continue my research and keep you posted with any update. But any Pointer would be helpful.

Btw Thanks Kimbert, Your pointers gave me directions and final we know why Broker is behaving so.

fjb_saper

Posted: Tue Feb 09, 2010 9:04 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

er_pankajgupta84 wrote:

Your length constraint is defined wrong. You defined the length constraint in bytes but are dealing with a multibyte CCSID. You need to define the constraint in characters and not in bytes. It will then not matter how many bytes the character needs to be represented...

Have fun

_________________
MQ & Broker admin

Display posts from previous:

Goto page 1, 2, 3 Next

Page 1 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Japanese characters problem with fixed length message set

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP