ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » CSV file input, JSON msg output, charset encoding problem

Post new topic  Reply to topic
 CSV file input, JSON msg output, charset encoding problem « View previous topic :: View next topic » 
Author Message
cevans
PostPosted: Thu Jul 18, 2013 7:13 am    Post subject: CSV file input, JSON msg output, charset encoding problem Reply with quote

Apprentice

Joined: 18 Jul 2013
Posts: 26

Apologies if I am asking a dumb question but I have found the IBM documentation or web search to have no clear explanation or example that provides clarity to the problem I am experiencing. Broker is quite new to me so I am only familiar with how it works for what I have experienced in training and the projects I have worked on. Anyway …

I take a csv file as input. The file is win-1252, encoded and can contain extended characters such as £.

I successfully parse the input using a xsd and can view the message tree in debug with £ characters correctly displayed.

I copy the message to environment so that I can use it later.

I then read an MQ Queue to retrieve a JSON, utf-8, message which must be updated with data from the csv file.

When I update the file by replacing part of the input message tree with the saved csv tree (including £) the new output message looks fine in debug.

The updated JSON, utf-8, message is written to an MQ Queue.

If I call the service that reads the updated JSON message from the Queue I see the £ symbol represented as £. I have researched this and believe the underlying coding for the character is still windows-1252 as this is the correct representation of that code when seen as utf-8.

So it looks like I need to convert the csv message to utf-8 before I use the content to update a utf-8 file. Sounds reasonable.

The problem is what is the best way to do this? The Help and searching has not identified a clear approach and certainly no suitable or similar example. Or maybe I am not looking in the right place?

I have tried the following with no success.
Change the CodedCharSetId Property
Create a bit stream from the parsed csv and then create an output tree from this bit stream using the CCSID for UTF (see below)

DECLARE inEncoding INT InputProperties.Encoding;
DECLARE inCCSID INT InputProperties.CodedCharSetId;
DECLARE inBitStream BLOB ASBITSTREAM(InputRoot.DFDL,inEncoding,inCCSID);

DECLARE outEncoding INT inEncoding;
DECLARE outCCSID INT 437;
CREATE LASTCHILD OF OutputRoot DOMAIN('DFDL') PARSE (inBitStream,outEncoding,outCCSID,'IncentiveCodes','IncentiveCodes');

So having tried a simple and more complex solution without success I wanted to ask the community for suggestions on an approach and if someone has achieved this a working example would be fantastic. I have seen many response here in relation to similar code page issue so if you want to tell me to RTFM then please don’t bother posting as that would not be helpful.

Many thanks
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Thu Jul 18, 2013 7:54 am    Post subject: Re: CSV file input, JSON msg output, charset encoding proble Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

cevans wrote:
a working example would be fantastic.


What happened when you tried the working example sample that comes with toolkit?
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
fatherjack
PostPosted: Thu Jul 18, 2013 8:14 am    Post subject: Reply with quote

Knight

Joined: 14 Apr 2010
Posts: 522
Location: Craggy Island

Are you reading the csv file with a file input node? Is the CSSID on the node set correctly?
_________________
Never let the facts get in the way of a good theory.
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Jul 18, 2013 8:25 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

What CCSID are you setting in the outbound CCSID? Does that reflect the UTF-8 nature of the payload or the default of the broker's queue manager?
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Thu Jul 18, 2013 9:07 am    Post subject: Re: CSV file input, JSON msg output, charset encoding proble Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

cevans wrote:
If I call the service that reads the updated JSON message from the Queue I see the £ symbol represented as £. I have researched this and believe the underlying coding for the character is still windows-1252 as this is the correct representation of that code when seen as utf-8.

No, that's backwards, (or at least poorly worded). It's coded as utf-8, but displayed as windows-1252, so, it's an issue for the display tool, not the message data.
Back to top
View user's profile Send private message
goffinf
PostPosted: Thu Jul 18, 2013 10:41 am    Post subject: Reply with quote

Chevalier

Joined: 05 Nov 2005
Posts: 401

Use a hex editor to view your output and check the value for the £ character. This should conclusively tell you what character encoding has been used. If it really is UTF-8 then the £ character will show up as c2a3

RFHUtil can show you an MQ message as both Hex and character

Fraser.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Jul 18, 2013 7:21 pm    Post subject: Re: CSV file input, JSON msg output, charset encoding proble Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

cevans wrote:
Code:

DECLARE inEncoding INT InputProperties.Encoding;
DECLARE inCCSID INT InputProperties.CodedCharSetId;
DECLARE inBitStream BLOB ASBITSTREAM(InputRoot.DFDL,inEncoding,inCCSID);
      
DECLARE outEncoding INT inEncoding;
DECLARE outCCSID INT 437;
CREATE LASTCHILD OF OutputRoot DOMAIN('DFDL') PARSE (inBitStream,outEncoding,outCCSID,'IncentiveCodes','IncentiveCodes');

Many thanks


So 437 does not represent the CCSID for UTF-8. That would be 1208!
You might have to check the correspondance between code page and ccsid number...

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
cevans
PostPosted: Thu Jul 18, 2013 11:54 pm    Post subject: Reply with quote

Apprentice

Joined: 18 Jul 2013
Posts: 26

fatherjack wrote:
Are you reading the csv file with a file input node? Is the CSSID on the node set correctly?


Yes I read the csv with a file input node. The xsd is set with the following CSSID ... Well I believe it is correct

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:csv="http://www.ibm.com/dfdl/CommaSeparatedFormat" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:xsd="http://www.w3.org/2001/XMLSchema">


<xsd:import namespace="http://www.ibm.com/dfdl/CommaSeparatedFormat" schemaLocation="IBMdefined/CommaSeparatedFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format documentFinalTerminatorCanBeMissing="yes" encoding="{$dfdl:encoding}" escapeSchemeRef="csv:CSVEscapeScheme" ref="csv:CommaSeparatedFormat"/>
</xsd:appinfo>
</xsd:annotation>

<xsd:element dfdl:byteOrder="bigEndian" dfdl:encoding="windows-1252" ibmSchExtn:docRoot="true" name="IncentiveCodes">
Back to top
View user's profile Send private message
cevans
PostPosted: Thu Jul 18, 2013 11:57 pm    Post subject: Re: CSV file input, JSON msg output, charset encoding proble Reply with quote

Apprentice

Joined: 18 Jul 2013
Posts: 26

lancelotlinc wrote:
cevans wrote:
a working example would be fantastic.


What happened when you tried the working example sample that comes with toolkit?


Is there a working example that takes a file input and converts it to a different character set and then outputs it on a queue? I do not have a MQ Input, if I did then I would be able to convert it then I believe.

Could you point me at the example you are referring to please.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Fri Jul 19, 2013 12:03 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Once your data is turned into a logical message tree, it has been converted from the source character set into an internal character set.

When your logical message tree is serialized into an output bitstream, it is converted into the character set that indicated in that logical message tree, in the appropriate header for the output node doing the serialization.

Your problem is likely not what you think it is.

I suggest you instrument your flow with several trace nodes at various points, and ask them to show you the Root and the Environment tree.

Then, in addition, take a user trace of an execution of the flow.

This will allow you to determine the place in your flow where the data you think is converted wrong is actually being converted wrong - or more likely, where your code is not creating the right message tree.
Back to top
View user's profile Send private message
kimbert
PostPosted: Fri Jul 19, 2013 12:57 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

As this is a long thread...

mqjeff's advice is good advice. The key point is that all character data in a message flow is in UTF-16 ( CCSID 1200 ). All incoming data gets converted to UTF-16 internally, and it gets converted to the output CCSID when written by an output node ( or by ASBITSTREAM ).

Hopefully, you now realise that this statement, although reasonable, is entirely wrong
Quote:
So it looks like I need to convert the csv message to utf-8 before I use the content to update a utf-8 file. Sounds reasonable.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » CSV file input, JSON msg output, charset encoding problem
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.