|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
[Solved]Parsing Error for UTF-8 file containin chinese chars |
« View previous topic :: View next topic » |
Author |
Message
|
abhyyy |
Posted: Sun Jan 15, 2012 2:01 am Post subject: [Solved]Parsing Error for UTF-8 file containin chinese chars |
|
|
Voyager
Joined: 29 Sep 2011 Posts: 83
|
Hi Friends,
I have an input file thrown by another application into folder having record format as below(pipe delimeted). That file contains some chinese characters, that is the reason other application is throwing it in UTF-8 and not in ANSI.
Problem : When I try to Parse the file using message set with file node created to parse the delimited file. I am receiving parsing error.
If I remove the chinese characters and replace them with some english characters, even then it is not working. But my same message set is working with ANSI file format containing only english characters. I have already forcing using CCSID 1208 in Message set and file node but didnt work.
file record Sample :
0|6594993543|XMAS2011|XMASOFFER|123456789|OFFER_FOR_MALAYSIANS|X|特殊字符测试|20111225121212|P|
Please advice if I need to make any changes in Fileinput node and message set inorder to correctly read annd Parse Utf-8 file with Chinese characters. _________________ ----------------------
NeVeR StOp LeaRnInG.
Last edited by abhyyy on Sun Jan 15, 2012 9:42 am; edited 1 time in total |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Jan 15, 2012 3:48 am Post subject: Re: Parsing Error for UTF-8 file containing chinese chars. |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
abhyyy wrote: |
If I remove the chinese characters and replace them with some english characters, even then it is not working.
|
Does that not tell that it might not be the chinese characters that are causing the failure?
What is the exact error you are seeing? Take a user trace and post the relevant output. You might be surprised by the information it gives you. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
abhyyy |
Posted: Sun Jan 15, 2012 9:39 am Post subject: |
|
|
Voyager
Joined: 29 Sep 2011 Posts: 83
|
Found the problem!!
The problem was in my First input fiield. If u check the Sample message that I posted earlier has first field as 0(an integer).
Since a general principle of UTF-8 is that the first byte either is a single-byte character or indicates length of multi-byte code by the number of 1's before the first 0 and is then filled up with data bits.
So, I cannot keep my first field in TDS format as integer (it has to be a character) when I am reading tag delimeted records from UTF-8 encoded file using File input Node.
Thanks a lot friends for you precious time. _________________ ----------------------
NeVeR StOp LeaRnInG. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Jan 16, 2012 2:16 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Glad you got it working...but your explanation is not correct.
Quote: |
Since a general principle of UTF-8 is that the first byte either is a single-byte character or indicates length of multi-byte code by the number of 1's before the first 0 and is then filled up with data bits. |
That statement is correct
Quote: |
So, I cannot keep my first field in TDS format as integer (it has to be a character) when I am reading tag delimeted records from UTF-8 encoded file using File input Node. |
That statement is not correct. The '0' at the start of the line is a character. It does not matter whether that character is encoded as UTF-8 or ASCII. If you tell the TDS parser that the field is one character long, then it will consume as many bytes as it needs to ( assuming that you have set the CCSID property correctly ).
Perhaps you had incorrectly set the 'length units' field to 'bytes' for that first field? But when you changed the type to 'xs:string' you changed the 'length units' to 'characters' which made it work again? |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|