ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing Error using ISO-8859-1 in Linux

Post new topic  Reply to topic
 Parsing Error using ISO-8859-1 in Linux « View previous topic :: View next topic » 
Author Message
andrewhirst
PostPosted: Thu Oct 06, 2011 7:04 am    Post subject: Parsing Error using ISO-8859-1 in Linux Reply with quote

Apprentice

Joined: 06 Jul 2004
Posts: 33
Location: UK

Hi,

This might be a silly question. If it is I apologise in advance

Basically I have a message containing a '¿' as data. On a windows based development machine the message is sucessfully parsed. However on a Linux machine, a parser error occurs.

The XML file contains: <?xml version="1.0" encoding="ISO-8859-1"?>

Any ideas. I suspect that it relates to the CCSID or somesuch....

Development box is running WMQ6.0.0.1 & WMB 6.1.0.4
Linux box is running WMQ7.0.0.1 & WMB 6.1.0.4

TIA

Andrew
_________________
Andrew Hirst
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Oct 06, 2011 7:14 am    Post subject: Re: Parsing Error using ISO-8859-1 in Linux Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

andrewhirst wrote:
Any ideas.

Yes.
andrewhirst wrote:
I suspect that it relates to the CCSID or somesuch....

Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Oct 06, 2011 7:18 am    Post subject: Re: Parsing Error using ISO-8859-1 in Linux Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

andrewhirst wrote:
I have a message containing a '¿' as data.


What is that actual value? It's actual hex? Is it in the list of valid XML characters as defined by W3C? Is it in a normal tag, CDATA or what?

andrewhirst wrote:
The XML file contains: <?xml version="1.0" encoding="ISO-8859-1"?>


So what does that encoding tell the parser about the message content?

andrewhirst wrote:
I suspect that it relates to the CCSID or somesuch....


I suspect you're right. What character sets are being used on the Windows & Linux boxes, and how does that releate to what the XML claims it's encoded as?

And from a design standpoint, why have you got non-printable characters in an XML document anyway? They cause exactly this kind of portability problems.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
andrewhirst
PostPosted: Fri Oct 07, 2011 2:37 am    Post subject: Reply with quote

Apprentice

Joined: 06 Jul 2004
Posts: 33
Location: UK

Actual value in the incoming XML message is 0xBF or 191 decimal and it is a valid XML character, but not with UTF8 encoding.

The character is part of some descriptive text entered in a field in a table somewhere out of our control. It is probably that the character is added at the data entry point. It is in a normal tag by the time it gets to the broker.

I executed locale on the linux box and got this back:
Code:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=



Quote:
So what does that encoding tell the parser about the message content?
That it should expect a certain character set. UTF 8 is basically ASCII. ISO-8859-1 covers western European languages, including the ¿

I've checked the CCSID on the XP box and the Linux box: 850 and 1280 respectively.
Quote:
and how does that releate to what the XML claims it's encoded as?
Well if the linux box is UTF8 then I guess that it won't understand. I presume that I need to change the locale info somehow? E.g.: export LANG=en_US.ISO-8859-1
Could I change the value on the QMgr in Linux? From 1280 to 819?

Quote:
And from a design standpoint, why have you got non-printable characters in an XML document anyway? They cause exactly this kind of portability problems.

The content of the tags is out of my control, besides, the ¿ is a valid character.

Thanks for the pointers - I have had to work to find the answers and that is good
_________________
Andrew Hirst
Back to top
View user's profile Send private message
mqjeff
PostPosted: Fri Oct 07, 2011 4:33 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

The machine locale doesn't apply here, per se.

If the XML data says that it is in ISO-8859-1, then the container for the XML data (the message - if it's an MQ message (you've not stated!), then the CCSID) needs to match that.

UTF-8 is *NOT* basically ASCII.

UTF-8 is Unicode in a specific set of byte-encodings,and covers a significantly larger set of characters than ASCII, and almost certainly covers your ¿

You should confirm that you are sending your ISO-8859-1 data in a container that also indicates that the data is ISO-8859-1.

You should also strongly consider converting your XML data to actually be in UTF-8. At the sender.
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Oct 07, 2011 4:45 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

andrewhirst wrote:
UTF 8 is basically ASCII.


It's not basically anything. It's exactly something, and not ASCII. With this kind of problem the devil is in the details.

andrewhirst wrote:
Could I change the value on the QMgr in Linux? From 1280 to 819?


This would make you popular with anyone else using the queue manager relying on that setting, and wouldn't help.

A more important question is what is the CCSID on the inbound message, which is what is used to decode the message rather than the queue manager's CCSID? Is it 819, 1208, 1200 or some thing else?

andrewhirst wrote:
The content of the tags is out of my control, besides, the ¿ is a valid character.


Then bypass the whole message and have the sending application use Unicode.

andrewhirst wrote:
Thanks for the pointers - I have had to work to find the answers and that is good


You're welcome. This forum applies the principle of teaching people to fish rather than giving people fish.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Sat Oct 08, 2011 4:41 pm    Post subject: Re: Parsing Error using ISO-8859-1 in Linux Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

This looks like a broker issue. Perhaps a moderator could move this thread to the broker forum.

andrewhirst wrote:
Basically I have a message containing a '¿' as data. On a windows based development machine the message is sucessfully parsed. However on a Linux machine, a parser error occurs.

A usertrace would provide a more detailed error message.

andrewhirst wrote:
Actual value in the incoming XML message is 0xBF or 191 decimal and it is a valid XML character ...

No, 0xBF is a meaningless byte value. Bytes are not characters. Bytes have a physical representation, but no inherent meaning. Characters have meaning, but no inherent physical representation. A character encoding defines a mapping between the two.

The sender uses a character encoding (ccsid) to map a sequence of characters to a sequence of bytes, when writing a message. The receiver needs to use the same character encoding (ccsid) to map the sequence of bytes back to a sequence of characters, when reading the message; otherwise, the receiver ends up with a garbled message.

Normally, the sender provides the input ccsid as part of the input message, so that the receiver can interpret it correctly.

andrewhirst wrote:
I've checked the CCSID on the XP box and the Linux box: 850 and 1280 respectively ... I presume that I need to change the locale info somehow?

1280 is "Apple Greek" (a variant of ISO 8859-7). 1208 is "UTF-8". Typo?

After reading the input message, the message flow may choose to convert it, either to match the qmgr ccsid, locale, or some other character encoding; otherwise the qmgr ccsid and locale are irrelevant.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Oct 09, 2011 5:26 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Moved to Message Broker forum.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
smdavies99
PostPosted: Sun Oct 09, 2011 7:51 am    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Quote:

Could I change the value on the QMgr in Linux? From 1280 to 819?


Shouldn't that be 1208?

However, changing the QMGR CCSID won't fix your problem. All it will do is change the default CCSID for any newly created messages or messages converted upon receipt over a channel.

As has been said, you need to get the sender to actually send a proper message.
One with the proper CCSID and everything else that has been mentioned in this thread.
This is Kimberts 1st law.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsing Error using ISO-8859-1 in Linux
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.