ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » WBIMB v6 java input node : setting xml charset encoding

Post new topic  Reply to topic
 WBIMB v6 java input node : setting xml charset encoding « View previous topic :: View next topic » 
Author Message
powerlord
PostPosted: Fri Sep 02, 2005 12:56 am    Post subject: WBIMB v6 java input node : setting xml charset encoding Reply with quote

Novice

Joined: 02 Sep 2005
Posts: 19

I've had a look through the forum and tried a few things without sucess so...

I've got a custom java input node which gets an XML message from somewhere (UTF-8).

Originally I was just getting the bytes for this (with UTF-8), and doing a createMessage(msgBytes);

this does parse into xml in MB, but does not cope with UTF-8 stuff.

For example if I have an XML message which has a pound sign (£), this gets mangled when I look at it in the debug flow. If I have an HTTPRequest in the flow I can see that the HTTP request sends out a mangled message too.

Looking at the properties of the message when it comes out my custom input node I see that the CodedCharSetId is 0.

So I set this in the java node to 1208 (UTF-8):

msg.getRootElement().getFirstChild().getFirstElementByPath("CodedCharSetId").setValue(new Integer(1208));

it now has the proper value set for this property.

However it still doesn't parse it properly.

I even tried putting a compute node after the input node and changing the encoding there:

SET OutputRoot.XML.(XML.XMLDecl).(XML.Encoding)Encoding = 'UTF-8';

still mangled.

I then added a resetCOntentDescriptor to force a reparse after the compute, but still mangled.

so question is:

what can I set in my java node to force MB to parse the UTF-8 bytes I'm giving it for the message into a UTF-8 XML message ??

stu
Back to top
View user's profile Send private message
powerlord
PostPosted: Mon Sep 05, 2005 7:42 am    Post subject: can noone help ?? Reply with quote

Novice

Joined: 02 Sep 2005
Posts: 19

Still got this problem.

MB is definately not correctly parsing the UTF-8 bytes.

simply input message:

<value>£100</value>

saved to a file as a valid UTF-8 format file (via Textpad)... checked in binary mode to confirm "£" is saved as C2A3.

In input node, read bytes out of file...call createMessage(bytes).

and the £ appears garbled in the parsed Message.

Setting CodedCharSetId after the createMessage has no effect. Setting it before is not possible (as I don't have a message to get to the properties).

arg.

help.

stu
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Mon Sep 05, 2005 11:12 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Well UTF-8 and a number of other CCSIDs will garble some stuff in XML.
Like the currency sign, and other stuff. In fact it uses some control char and passes the char value...

You have to parse from XML to text using the correct CCSID to see that stuff in clear text. Or use something like XML Spy...

See as well java.nio.* classes for reading from one CCSID and writing in another.

Enjoy
Back to top
View user's profile Send private message Send e-mail
powerlord
PostPosted: Tue Sep 06, 2005 12:27 am    Post subject: thanks Reply with quote

Novice

Joined: 02 Sep 2005
Posts: 19

thanks, but I don't think that is the problem.

Here are some code snippets.

I have a UTF-8 saved simple XML file which looks like:

<?xml version="1.0" encoding="UTF-8"?>
<Value>£1234</Value>

I've checked this in hex to confirm that '£' is saved as C2A3. This is definately a valid UTF-8 format bitstream.

Some code from my AFInput node:



Code:

....
try{
                File f = new File ("c:\\in.txt");
                DataInputStream bis = new DataInputStream(new FileInputStream(f));
                msgBytes = new byte[(int)f.length()];
                bis.readFully(msgBytes);
                bis.close();
            }catch(Exception e){}
           
MbMessage msg = null;

msg = createMessage(msgBytes);
               
                MbElement props = msg.getRootElement().getFirstChild();
                MbElement ccsid = props.getFirstElementByPath("CodedCharSetId");
                Object o = ccsid.getValue();
                ccsid.setValue(new Integer(1208));

            MbMessageAssembly newAssembly = new MbMessageAssembly(ma, msg);

msg.finalizeMessage(MbMessage.FINALIZE_VALIDATE);

.....


A breakpoint after the node shows that a CodedCharSetId of 1208 HAS been set, but the XML has been pased with as:

XML
Value
-ú1234

So, now I try using the createElementAsLastChildFromBitstream method which allows me to specify encoding by simply detatching the root XML I've just created and creating a new one:

Code:

....
try{
                File f = new File ("c:\\in.txt");
                DataInputStream bis = new DataInputStream(new FileInputStream(f));
                msgBytes = new byte[(int)f.length()];
                bis.readFully(msgBytes);
                bis.close();
            }catch(Exception e){}
           
MbMessage msg = null;

msg = createMessage(msgBytes);
               
                MbElement props = msg.getRootElement().getFirstChild();
                MbElement ccsid = props.getFirstElementByPath("CodedCharSetId");
                Object o = ccsid.getValue();
                ccsid.setValue(new Integer(1208));

//OK... Now scrub what I've jsut created now that I've got a message object
                MbElement newXmlElement = msg.getRootElement().getLastChild();
                newXmlElement.detach();
                msg.getRootElement().createElementAsLastChildFromBitstream(msgBytes, "xml", null, null, null, 0, 1208, 0);
           
            MbMessageAssembly newAssembly = new MbMessageAssembly(ma, msg);

msg.finalizeMessage(MbMessage.FINALIZE_VALIDATE);



A breakpoint after the node shows that my bitstream has now properly been parsed as UTF-8/1208! :

XML
Value
£1234

*******************

So, it seems pretty clear to me that createMessage(), then setting CodedCharSetId has no effect, whereas createElementAsLastChildFromBitstream with its explicit parameter for CCSID works.

However clearly this is not a performant method of coding. To create a 'dummy' message just to delete the root XML and create it properly. Surely there is a better way ?
[/code]
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Tue Sep 06, 2005 7:04 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Have you opened a PMR yet?

I am assuming, also, that you really aren't using WBIMB v6, but are really using v5.

If you are using v6, then you should instead report this to the beta program that you are participating in.
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
powerlord
PostPosted: Tue Sep 06, 2005 11:21 pm    Post subject: Reply with quote

Novice

Joined: 02 Sep 2005
Posts: 19

yeh, MB5. CDS4 and CSD6 display same behaviour.

I wanted to check it was a bug before going further. If you reckon it is one I'll raise a PMR.

stu
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » WBIMB v6 java input node : setting xml charset encoding
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.