ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » [Solved]Parsing Error for UTF-8 file containin chinese chars

Post new topic  Reply to topic
 [Solved]Parsing Error for UTF-8 file containin chinese chars « View previous topic :: View next topic » 
Author Message
abhyyy
PostPosted: Sun Jan 15, 2012 2:01 am    Post subject: [Solved]Parsing Error for UTF-8 file containin chinese chars Reply with quote

Voyager

Joined: 29 Sep 2011
Posts: 83

Hi Friends,

I have an input file thrown by another application into folder having record format as below(pipe delimeted). That file contains some chinese characters, that is the reason other application is throwing it in UTF-8 and not in ANSI.

Problem : When I try to Parse the file using message set with file node created to parse the delimited file. I am receiving parsing error.

If I remove the chinese characters and replace them with some english characters, even then it is not working. But my same message set is working with ANSI file format containing only english characters. I have already forcing using CCSID 1208 in Message set and file node but didnt work.


file record Sample :
0|6594993543|XMAS2011|XMASOFFER|123456789|OFFER_FOR_MALAYSIANS|X|特殊字符测试|20111225121212|P|

Please advice if I need to make any changes in Fileinput node and message set inorder to correctly read annd Parse Utf-8 file with Chinese characters.
_________________
----------------------
NeVeR StOp LeaRnInG.


Last edited by abhyyy on Sun Jan 15, 2012 9:42 am; edited 1 time in total
Back to top
View user's profile Send private message
smdavies99
PostPosted: Sun Jan 15, 2012 3:48 am    Post subject: Re: Parsing Error for UTF-8 file containing chinese chars. Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

abhyyy wrote:

If I remove the chinese characters and replace them with some english characters, even then it is not working.


Does that not tell that it might not be the chinese characters that are causing the failure?

What is the exact error you are seeing? Take a user trace and post the relevant output. You might be surprised by the information it gives you.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
abhyyy
PostPosted: Sun Jan 15, 2012 9:39 am    Post subject: Reply with quote

Voyager

Joined: 29 Sep 2011
Posts: 83

Found the problem!!

The problem was in my First input fiield. If u check the Sample message that I posted earlier has first field as 0(an integer).

Since a general principle of UTF-8 is that the first byte either is a single-byte character or indicates length of multi-byte code by the number of 1's before the first 0 and is then filled up with data bits.
So, I cannot keep my first field in TDS format as integer (it has to be a character) when I am reading tag delimeted records from UTF-8 encoded file using File input Node.

Thanks a lot friends for you precious time.
_________________
----------------------
NeVeR StOp LeaRnInG.
Back to top
View user's profile Send private message
kimbert
PostPosted: Mon Jan 16, 2012 2:16 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Glad you got it working...but your explanation is not correct.
Quote:
Since a general principle of UTF-8 is that the first byte either is a single-byte character or indicates length of multi-byte code by the number of 1's before the first 0 and is then filled up with data bits.
That statement is correct
Quote:
So, I cannot keep my first field in TDS format as integer (it has to be a character) when I am reading tag delimeted records from UTF-8 encoded file using File input Node.
That statement is not correct. The '0' at the start of the line is a character. It does not matter whether that character is encoded as UTF-8 or ASCII. If you tell the TDS parser that the field is one character long, then it will consume as many bytes as it needs to ( assuming that you have set the CCSID property correctly ).
Perhaps you had incorrectly set the 'length units' field to 'bytes' for that first field? But when you changed the type to 'xs:string' you changed the 'length units' to 'characters' which made it work again?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » [Solved]Parsing Error for UTF-8 file containin chinese chars
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.