Author |
Message
|
er_pankajgupta84 |
Posted: Tue Feb 09, 2010 7:26 am Post subject: Japanese characters problem with fixed length message set |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
I have a fixed length message set which is doing its job perfectly when it receives ASCII data.
Problem comes when we send Japanese data to this message set. Many Japanese characters takes 2 -4 bytes and i think that is causing an issue.
My QM CCSID is 1208 and source (Mainframe) is specifying the CCSID on the message as well. When Mainframe converts the data to 819 and send it to Broker then Message set is able to parse the message but produces junk characters for some Japanese characters. Which is quite possible as 819 does not have complete representation of Japanese character set.
If Mainframe send data after converting it to 1208 then Message is failing at parsing due to length constraint. CCSID on mainframe compiler is 37.
Can someone through some light on it. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Feb 09, 2010 7:35 am Post subject: Re: Japanese characters problem with fixed length message se |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
er_pankajgupta84 wrote: |
Can someone through some light on it. |
You said it yourself - Japanese characters can't be represented in code page 819, or any single byte code page because they're 2-4 bytes in length.
You need Unicode. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Feb 09, 2010 8:01 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
That's what the problem is .. if i use any other code set then the number of bytes increases and message is failing because of length constraint.
Its a fixed length message set. Is there any setting in the message set that i can do or any other way in which i can accommodate this variable length fields in fixed length.
I am sure there has to be a way as otherwise we cannot use fixed length message set for japanese and other asian languages. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Feb 09, 2010 8:18 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
er_pankajgupta84 wrote: |
Its a fixed length message set. Is there any setting in the message set that i can do or any other way in which i can accommodate this variable length fields in fixed length. |
Use Unicode.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Feb 09, 2010 8:21 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Are you counting lengths in bytes, or characters? What does the message definition and message set think you are counting them in? |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Feb 09, 2010 9:13 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
I don't see any difference when i specify length in character or Bytes..
What i feel if you specify characters then it should not matter how many bytes you send for one character...message set should be able to parse it.
But that's not happening. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Feb 09, 2010 9:26 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
If you specify your lengths are in characters, then sending multi-byte unicode characters should not cause a fixed-length model to fail.
There are some bugs in this area that I've seen. Your best bet is to ensure that you are at the *most* recent FP of your version of Broker (6.1.0.5 or 7.0.0.0 or 6.0.0.10), and then take a user trace to confirm that the issue is not with your model or your code somehow.
Then you could open a PMR. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Feb 09, 2010 9:27 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
er_pankajgupta84 wrote: |
I don't see any difference when i specify length in character or Bytes |
You should. And this might explain why you keep ignoring my advice.
er_pankajgupta84 wrote: |
What i feel if you specify characters then it should not matter how many bytes you send for one character...message set should be able to parse it. |
No, it shouldn't. No matter how you feel on the subject.
er_pankajgupta84 wrote: |
But that's not happening. |
No, it won't.
I repeat the advice I've given you twice before. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Feb 09, 2010 9:30 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Right, sorry, that's the other thing.
Send this data as Unicode. Don't Don't DON'T try to mix US-ASCII and Japanese characters. Don't try to convert Japanese characters into EBCDIC. Don't try to convert them to code page 819.
make it all Unicode, and THEN confirm your fp levels and take a user trace. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Feb 09, 2010 11:44 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Data is coming from mainframe. CCSID on mainframe is EBSDIC - 37. So whatever that comes from mainframe would be in EBSDIC. Their is a mainframe program that can convert this data from any code page to any.
That program will send us the Unicode data. But the problem is if we are using unicode its length increases and it failed at the parser for length validation. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Feb 09, 2010 11:53 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
er_pankajgupta84 wrote: |
That program will send us the Unicode data. But the problem is if we are using unicode its length increases and it failed at the parser for length validation. |
Then we are faced with these possibilities:
- There's a problem with the message set. Unlikely as it works with single byte character sets.
- The conversion the mainframe is doing into Unicode is incorrect, or it's failing to correctly identify the message as being in Unicode (by correctly setting the code page)
- There's a bug in the message set parser and you should raise a PMR.
You said initially that the mainframe was converting into 819 rather than Unicode. Have you since tried it with Unicode and got the same result? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Feb 09, 2010 12:56 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I think this is what is happening ( or something like it ).
- Your message set declares a string field which is (say) 20 bytes long
- Your message set also has a length constraint ( actually an xs:length facet) which checks that the field is 20 characters long.
- If you use a single-byte encoding like 819, then one byte=one character. So you always get exactly 20 characters. So the length constraint is satisfied.
- If you use code page 1208 (UTF- then any non-ASCII characters will take up 2 or 3 bytes. So the number of characters in the parsed string will be <20. So the length constraint will be broken and you will get a validation error.
Quote: |
Data is coming from mainframe. CCSID on mainframe is EBSDIC - 37. So whatever that comes from mainframe would be in EBSDIC. Their is a mainframe program that can convert this data from any code page to any.
That program will send us the Unicode data. |
EBCDIC ( code page 37 ) cannot represent any Japanese characters. Did you forget to tell us something? Or is the code page conversion utility getting its shoelaces tied together?
Regardless of the answer to that question, you cannot have a variable-width code page AND a fixed-width field AND a length constraint on the field. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Feb 09, 2010 7:21 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Quote: |
If you use a single-byte encoding like 819, then one byte=one character. So you always get exactly 20 characters. So the length constraint is satisfied.
- If you use code page 1208 (UTF- then any non-ASCII characters will take up 2 or 3 bytes. So the number of characters in the parsed string will be <20. |
I completely agree with this statement. But even when i use code page 1208 then also the number of bytes are 20 even though characters are less than 20. And when i specified "Length Unit" of the field as "Bytes" then broker should look for number of bytes while validating length.
Quote: |
Your message set also has a length constraint ( actually an xs:length facet) which checks that the field is 20 characters long. |
What does this mean. I only specified the length on the field itself. Its a CSV message set. Along with that I do have a MAX length constraint which defines it to be 20 but there is no MIN length constraint.
Is there any other thing i need to check in message set. It might be possible even though the Length Unit of the field is set to "Bytes" but still some other setting might be forcing it to validate 20 characters instead of 20 bytes. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Feb 09, 2010 7:34 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
I think i missed this stuff.
My message set definition consists of RECORDS which further have fields. Each RECORD is fixed length and I have specified the length of each field in the field itself.
I have used "Use Pattern" as Data Element Separator for each RECORD. As each RECORD is a fixed length record so in pattern I have given the number of characters to be read.
FOR Example: First record would be [0-9]{5}00010.{220}
I think this is forcing broker to fail for length validation even if I specify "Bytes" as length Unit on the fields.
Now, The question is:
I cannot change the Data Element Separator of my records as there is no other available. How can I make it work? How can I give Bytes instead of Characters in Patterns.
I will continue my research and keep you posted with any update. But any Pointer would be helpful.
Btw Thanks Kimbert, Your pointers gave me directions and final we know why Broker is behaving so. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Feb 09, 2010 9:04 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
er_pankajgupta84 wrote: |
That's what the problem is .. if i use any other code set then the number of bytes increases and message is failing because of length constraint.
Its a fixed length message set. Is there any setting in the message set that i can do or any other way in which i can accommodate this variable length fields in fixed length.
I am sure there has to be a way as otherwise we cannot use fixed length message set for japanese and other asian languages. |
Your length constraint is defined wrong. You defined the length constraint in bytes but are dealing with a multibyte CCSID. You need to define the constraint in characters and not in bytes. It will then not matter how many bytes the character needs to be represented...
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|