MQSeries.net :: View topic - Problem with conversion EBCDIC character to UTF-8 in JCN

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Problem with conversion EBCDIC character to UTF-8 in JCN

Problem with conversion EBCDIC character to UTF-8 in JCN

« View previous topic :: View next topic »

Author

Message

MQEnthu

Posted: Wed May 06, 2009 5:33 am Post subject: Problem with conversion EBCDIC character to UTF-8 in JCN

Partisan

Joined: 06 Oct 2008
Posts: 329
Location: India

We have scenario, where we receive the message in CCSID 278 (EBCDIC) and we need to convert a perticular field to UTF-8. We are using the JCN for transformation in our flow. The conversion is happening and all english characters are getting converted normally. But when the field contains the character "%" the conversion is not happening properly - it is getting converted to Hex15 which is not valid UTF-8. The java code I am using to convert the string is here:

Code:

try {
         byte[] bytesCp278 = inField_UTF8.getBytes("Cp278");



         String fieldCp278_to_UTF8 = new String(bytesCp278,"UTF-8");

         return trimR(fieldCp278_to_UTF8);
   } catch (UnsupportedEncodingException e) {

I also tried this in normal java program and I got the same result. But when we tried with the differnt JVM (than the one provided with WMB), the result was as excepected. i.e. the character % was getting converted properly (Hex25 and NOT 15). The JVM we used was:

Vendor: Sun Microsystems Inc.
Version:1.5.0_17-b04

It appears that the JVM provided with the WMB (V 2.3) does not support the conversion I am doing.

Please let me know if you have come across anything like this; Suggest if I am doing anything wrong

Thank you.
_________________
-----------------------------------------------
It is good to remember the past,
but don't let past capture your future

fjb_saper

Posted: Wed May 06, 2009 7:32 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Have you checked if there is a cast command on the MbElement?

_________________
MQ & Broker admin

MQEnthu

Posted: Wed May 06, 2009 8:22 pm Post subject:

Partisan

Joined: 06 Oct 2008
Posts: 329
Location: India

fjb_saper wrote:

Have you checked if there is a cast command on the MbElement?

I checked.. there is no CAST method on MbElement..

_________________
-----------------------------------------------
It is good to remember the past,
but don't let past capture your future

fjb_saper

Posted: Wed May 06, 2009 8:42 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Instead of using cp278, have you tried retrieving the CCSID from the message header (Properties) and applying that? Could it be possible that the message has a different CCSID than what you think? (500? EBCDIC international...)

_________________
MQ & Broker admin

mqjeff

Posted: Thu May 07, 2009 4:18 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

I'm kind of confused about where you're getting this data that you need to convert it from a non-broker codepage.

If it's being treated as character data by any of the broker parsers, it will have already been turned into broker native (UTF-16?) characters for you.

MQEnthu

Posted: Fri May 08, 2009 12:42 am Post subject:

Partisan

Joined: 06 Oct 2008
Posts: 329
Location: India

mqjeff wrote:

....you need to convert it from a non-broker codepage.

Message we are receiving is a EBCDIC message. CCSID 278.

Sorry, I did not get what do you mean by non-broker code page. Did you refer to the Cp278 I am using in my code. In that case:
As I mentioned in my previous post that I am using JCN in my flow. And I cound not find the WMBApi method for getting the bitstream of a field. (AFAIK, MbElement.toBitStream() can be used only at message body level and not at the field level). Hence I used normal Java method getBytes(), Due to this I have given the code page as Cp278 which corresponds to CCSID 278.
_________________
-----------------------------------------------
It is good to remember the past,
but don't let past capture your future

fjb_saper

Posted: Fri May 08, 2009 2:34 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

The point Jeff was trying to make is that you can only get a byte[] from the element in a ccsid different from the broker's if the element is a BLOB.

If the element has been parsed your ccsid is 1200.

Have fun

_________________
MQ & Broker admin

rekarm01

Posted: Sun May 17, 2009 3:01 pm Post subject: Re: Problem with conversion EBCDIC character to UTF-8 in JCN

Grand Master

Joined: 25 Jun 2008
Posts: 1415

MQEnthu wrote:

We have scenario, where we receive the message in CCSID 278 (EBCDIC) and we need to convert a particular field to UTF-8. We are using the JCN for transformation in our flow.

Is there a particular reason for performing character data conversion inside a JCN? It's probably easier to use ESQL.

MQEnthu wrote:

... But when the field contains the character "%" the conversion is not happening properly - it is getting converted to Hex15 which is not valid UTF-8.

Actually, Hex15 is valid UTF-8, but it's not a valid XML 1.0 character.

Applications that read, write, and convert character data need to keep track of the data's character encoding. Applications must read character data using the same encoding that was used to write it; when converting character data, an application must know which encoding to convert from, as much as which encoding to convert to, to avoid misconverting characters.

Multiple misconversions can sometimes cancel each other out, making them more difficult to detect.

MQEnthu wrote:

The java code I am using to convert the string is here:

Code:

try {
byte[] bytesCp278 = inField_UTF8.getBytes("Cp278");
String fieldCp278_to_UTF8 = new String(bytesCp278,"UTF-8");
}

The broker uses UCS-2 internally, to represent characters. The standard parsers automatically convert character data as needed:

parse: converts from bitstream (encoding=InputRoot.Properties.CodedCharSetId) to field (encoding=UCS-2)
write: converts from field (encoding=UCS-2) to bitstream (encoding=OutputRoot.Properties.CodedCharSetId)

Java uses UTF-16 internally, to represent Strings:

String.getBytes(String charsetName) converts from UTF-16 to charsetName
String(byte[] bytes, String charsetName) converts from charsetName to UTF-16

MQEnthu wrote:

But when we tried with the differnt JVM (than the one provided with WMB), the result was as excepected. i.e. the character % was getting converted properly (Hex25 and NOT 15).

The most likely sequence of misconversions, (based on the code posted above, and here) is:

Code:

convert from '%' (UTF-16) to X'25' (UTF-8) -- getBytes("UTF-8")
misconvert from X'25' (Cp278) to '<LF>' (UTF-16) -- String(byte[], "Cp278")
convert from '<LF>' (UTF-16) to X'15' (Cp278) -- getBytes("Cp278")
misconvert from X'15' (UTF-8) to '<NAK>' (UTF-16) -- String(byte[], "UTF-8")

Because of the previous misconversions, it's actually the <LF> character, not the '%' character, that behaves differently. There are two standards for EBCDIC handling of the newline function. and implementers are free to choose which standard they implement:

Sun JRE: ASCII/Unicode <LF> <--> EBCDIC <LF> (X'25')
IBM JRE: ASCII/Unicode <LF> <--> EBCDIC <NL> (X'15')

If necessary, some IBM JREs allow an application to override its default behavior, by setting the java property "ibm.swapLF=true".

MQEnthu wrote:

I cound not find the WMBApi method for getting the bitstream of a field.

Fields don't have bitstreams, but ESQL can convert a field value to a BLOB:

Code:

CAST(charField AS BLOB CCSID ...)

MQEnthu

Posted: Sun May 17, 2009 10:20 pm Post subject:

Partisan

Joined: 06 Oct 2008
Posts: 329
Location: India

Thank you very much for the info rekarm01.

yes, it would be easier to do it in ESQL.. but we are bound to use JCN in this project...

Later I got to know about "ibm.swapLF=true" . ..

However now we have asked sending system not to include multimple characterset and the problem has solved.
_________________
-----------------------------------------------
It is good to remember the past,
but don't let past capture your future

zpat

Posted: Sun May 17, 2009 10:35 pm Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

I don't understand why people want to abandon the purpose built broker language of ESQL for Java.

ESQL is much easier to maintain unless you are a Java developer in which case I would be very concerned about the sort of message flows being developed and whether they use the standard WMB nodes properly.

Vitor

Posted: Sun May 17, 2009 11:51 pm Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

zpat wrote:

I don't understand why people want to abandon the purpose built broker language of ESQL for Java.

IMHO, it's because it's very easy for a manager to read "Java" in the product sales literature & either a) drop WMB on the existing pool of Java developers available to him or b) go to the market and buy some Java developers.

This avoids the need to interview people for WMB skills, ESQL skills, or any other difficult stuff. The problems you articulate will only be apparent 2-3 years down the line, by which time the manager in question will have met his budget and standards KPFs, been moved on to bigger & better things and his replacement will get to clean up the mess.

Me, cynical?

_________________
Honesty is the best policy.
Insanity is the best defence.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Problem with conversion EBCDIC character to UTF-8 in JCN

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP