ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » File Input node : how to read UTF-8 aswell as ISO-8851

Post new topic  Reply to topic
 File Input node : how to read UTF-8 aswell as ISO-8851 « View previous topic :: View next topic » 
Author Message
Laurens
PostPosted: Wed Apr 10, 2013 1:11 am    Post subject: File Input node : how to read UTF-8 aswell as ISO-8851 Reply with quote

Apprentice

Joined: 01 Oct 2009
Posts: 35

Hi all,

I'm trying to let a message flow read files - through File Input Node - that may be either in UTF-8 encoding or ISO-8851 encoding.
The files are XML files and in the prolog one can find the encoding specified.

I thought that the filenode is clever enough to use that prolog information to set the CodedCharSetId in the message properties. However that seems not the case.

Is there an easy way handling this ? Probaly I'm missing something

Kind regards
Laurens
Back to top
View user's profile Send private message
kimbert
PostPosted: Wed Apr 10, 2013 1:26 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

The FileInput node does not read the XML prolog. The reasons are complex, and are explained on this thread: http://www.mqseries.net/phpBB/viewtopic.php?p=289723&sid=849b0c5db0fd7a2f34ea92901fff7bb0

There is a workaround detailed in the same thread.
Back to top
View user's profile Send private message
Laurens
PostPosted: Wed Apr 10, 2013 4:07 am    Post subject: Reply with quote

Apprentice

Joined: 01 Oct 2009
Posts: 35

Thanks Kimbert !

Looks good.

In the mean while I had created something simular with BLOB -> FlowOrder -> (first) ResetContentDescriptor Blob to XMLNSC -> take XMLEncoding
(second) -> set OutputRoot.properties.CodedCharSetId -> ResetContentDescriptor BLOB to XMLNSC

Is my solution more expensive ? I am correct in assuming then in (first) branch of the floworder the broker is not parsing the complete message. Correct ?
Back to top
View user's profile Send private message
smdavies99
PostPosted: Wed Apr 10, 2013 4:17 am    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

IF both your message types have a correct prologue then you could always just extract that (or compare the first 'n' bytes of the blob with a knows blobbified string (of a valid prologue)

IF you have large messages this may well be more performant that converting the whole BLOB to XMLNSC and then deciding upon the CCSID.

Some experimentation will prove which way takes less time.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Wed Apr 10, 2013 4:30 pm    Post subject: Re: File Input node : how to read UTF-8 as well as ISO-8851 Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Laurens wrote:
I'm trying to let a message flow read files - through File Input Node - that may be either in UTF-8 encoding or ISO-8851 encoding.

ISO 8851 specifies routine methods for determining the moisture content, non-fat solids content, and fat content of butter. WMB does not currently support that.

... or was that a typo?

Laurens wrote:
In the mean while I had created something simular with BLOB -> FlowOrder -> (first) ResetContentDescriptor Blob to XMLNSC -> take XMLEncoding

This requires an initial guess for ccsid, close enough to the actual character encoding to be able to read the XML prolog correctly. If the initial guess were an ASCII-based ccsid, for example, then this would only work for ASCII-based files (UTF-8, ISO 8859, etc.), but not for other files (UTF-16, EBCDIC, etc.) If that's a problem, then a more general BLOB-based solution is necessary.

With on-demand parsing to read the prolog, the FlowOrder node is not that expensive. But the FlowOrder node only propagates its input message to its first and second output terminals. Any changes in the output message through the first output terminal are not propagated through the second output terminal. So, the message flow will have to save the ccsid derived in the first part some other way, (such as in the Environment tree), in order to set it in the second part.
Back to top
View user's profile Send private message
smdavies99
PostPosted: Wed Apr 10, 2013 11:04 pm    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

You also have to remember that there are a good number of variants to the ISO-8859 Character Sets.

ISO-8859-1 is Western European but does not include the 'Euro' Symbol.
etc
etc
etc.
If you can guarantee that the XML prologue is correct (And often it is not due to lazy programmers who don't know the difference between 8859-1 and 8859-15) then by all means go ahead and determine the CCSID from the prologue.

My 2p worth is that it would be better to try to get everything UTF-8 but I am aware that this might not be possible.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » File Input node : how to read UTF-8 aswell as ISO-8851
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.