Author |
Message
|
cevans |
Posted: Thu Jul 18, 2013 7:13 am Post subject: CSV file input, JSON msg output, charset encoding problem |
|
|
Apprentice
Joined: 18 Jul 2013 Posts: 26
|
Apologies if I am asking a dumb question but I have found the IBM documentation or web search to have no clear explanation or example that provides clarity to the problem I am experiencing. Broker is quite new to me so I am only familiar with how it works for what I have experienced in training and the projects I have worked on. Anyway …
I take a csv file as input. The file is win-1252, encoded and can contain extended characters such as £.
I successfully parse the input using a xsd and can view the message tree in debug with £ characters correctly displayed.
I copy the message to environment so that I can use it later.
I then read an MQ Queue to retrieve a JSON, utf-8, message which must be updated with data from the csv file.
When I update the file by replacing part of the input message tree with the saved csv tree (including £) the new output message looks fine in debug.
The updated JSON, utf-8, message is written to an MQ Queue.
If I call the service that reads the updated JSON message from the Queue I see the £ symbol represented as £. I have researched this and believe the underlying coding for the character is still windows-1252 as this is the correct representation of that code when seen as utf-8.
So it looks like I need to convert the csv message to utf-8 before I use the content to update a utf-8 file. Sounds reasonable.
The problem is what is the best way to do this? The Help and searching has not identified a clear approach and certainly no suitable or similar example. Or maybe I am not looking in the right place?
I have tried the following with no success.
Change the CodedCharSetId Property
Create a bit stream from the parsed csv and then create an output tree from this bit stream using the CCSID for UTF (see below)
DECLARE inEncoding INT InputProperties.Encoding;
DECLARE inCCSID INT InputProperties.CodedCharSetId;
DECLARE inBitStream BLOB ASBITSTREAM(InputRoot.DFDL,inEncoding,inCCSID);
DECLARE outEncoding INT inEncoding;
DECLARE outCCSID INT 437;
CREATE LASTCHILD OF OutputRoot DOMAIN('DFDL') PARSE (inBitStream,outEncoding,outCCSID,'IncentiveCodes','IncentiveCodes');
So having tried a simple and more complex solution without success I wanted to ask the community for suggestions on an approach and if someone has achieved this a working example would be fantastic. I have seen many response here in relation to similar code page issue so if you want to tell me to RTFM then please don’t bother posting as that would not be helpful.
Many thanks |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Jul 18, 2013 7:54 am Post subject: Re: CSV file input, JSON msg output, charset encoding proble |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
cevans wrote: |
a working example would be fantastic. |
What happened when you tried the working example sample that comes with toolkit? _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
fatherjack |
Posted: Thu Jul 18, 2013 8:14 am Post subject: |
|
|
 Knight
Joined: 14 Apr 2010 Posts: 522 Location: Craggy Island
|
Are you reading the csv file with a file input node? Is the CSSID on the node set correctly? _________________ Never let the facts get in the way of a good theory. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Jul 18, 2013 8:25 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
What CCSID are you setting in the outbound CCSID? Does that reflect the UTF-8 nature of the payload or the default of the broker's queue manager? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
rekarm01 |
Posted: Thu Jul 18, 2013 9:07 am Post subject: Re: CSV file input, JSON msg output, charset encoding proble |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
cevans wrote: |
If I call the service that reads the updated JSON message from the Queue I see the £ symbol represented as £. I have researched this and believe the underlying coding for the character is still windows-1252 as this is the correct representation of that code when seen as utf-8. |
No, that's backwards, (or at least poorly worded). It's coded as utf-8, but displayed as windows-1252, so, it's an issue for the display tool, not the message data. |
|
Back to top |
|
 |
goffinf |
Posted: Thu Jul 18, 2013 10:41 am Post subject: |
|
|
Chevalier
Joined: 05 Nov 2005 Posts: 401
|
Use a hex editor to view your output and check the value for the £ character. This should conclusively tell you what character encoding has been used. If it really is UTF-8 then the £ character will show up as c2a3
RFHUtil can show you an MQ message as both Hex and character
Fraser. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jul 18, 2013 7:21 pm Post subject: Re: CSV file input, JSON msg output, charset encoding proble |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
cevans wrote: |
Code: |
DECLARE inEncoding INT InputProperties.Encoding;
DECLARE inCCSID INT InputProperties.CodedCharSetId;
DECLARE inBitStream BLOB ASBITSTREAM(InputRoot.DFDL,inEncoding,inCCSID);
DECLARE outEncoding INT inEncoding;
DECLARE outCCSID INT 437;
CREATE LASTCHILD OF OutputRoot DOMAIN('DFDL') PARSE (inBitStream,outEncoding,outCCSID,'IncentiveCodes','IncentiveCodes'); |
Many thanks |
So 437 does not represent the CCSID for UTF-8. That would be 1208!
You might have to check the correspondance between code page and ccsid number...
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
cevans |
Posted: Thu Jul 18, 2013 11:54 pm Post subject: |
|
|
Apprentice
Joined: 18 Jul 2013 Posts: 26
|
fatherjack wrote: |
Are you reading the csv file with a file input node? Is the CSSID on the node set correctly? |
Yes I read the csv with a file input node. The xsd is set with the following CSSID ... Well I believe it is correct
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:csv="http://www.ibm.com/dfdl/CommaSeparatedFormat" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/CommaSeparatedFormat" schemaLocation="IBMdefined/CommaSeparatedFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format documentFinalTerminatorCanBeMissing="yes" encoding="{$dfdl:encoding}" escapeSchemeRef="csv:CSVEscapeScheme" ref="csv:CommaSeparatedFormat"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:byteOrder="bigEndian" dfdl:encoding="windows-1252" ibmSchExtn:docRoot="true" name="IncentiveCodes"> |
|
Back to top |
|
 |
cevans |
Posted: Thu Jul 18, 2013 11:57 pm Post subject: Re: CSV file input, JSON msg output, charset encoding proble |
|
|
Apprentice
Joined: 18 Jul 2013 Posts: 26
|
lancelotlinc wrote: |
cevans wrote: |
a working example would be fantastic. |
What happened when you tried the working example sample that comes with toolkit? |
Is there a working example that takes a file input and converts it to a different character set and then outputs it on a queue? I do not have a MQ Input, if I did then I would be able to convert it then I believe.
Could you point me at the example you are referring to please. |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Jul 19, 2013 12:03 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Once your data is turned into a logical message tree, it has been converted from the source character set into an internal character set.
When your logical message tree is serialized into an output bitstream, it is converted into the character set that indicated in that logical message tree, in the appropriate header for the output node doing the serialization.
Your problem is likely not what you think it is.
I suggest you instrument your flow with several trace nodes at various points, and ask them to show you the Root and the Environment tree.
Then, in addition, take a user trace of an execution of the flow.
This will allow you to determine the place in your flow where the data you think is converted wrong is actually being converted wrong - or more likely, where your code is not creating the right message tree. |
|
Back to top |
|
 |
kimbert |
Posted: Fri Jul 19, 2013 12:57 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
As this is a long thread...
mqjeff's advice is good advice. The key point is that all character data in a message flow is in UTF-16 ( CCSID 1200 ). All incoming data gets converted to UTF-16 internally, and it gets converted to the output CCSID when written by an output node ( or by ASBITSTREAM ).
Hopefully, you now realise that this statement, although reasonable, is entirely wrong
Quote: |
So it looks like I need to convert the csv message to utf-8 before I use the content to update a utf-8 file. Sounds reasonable. |
|
|
Back to top |
|
 |
|