Author |
Message
|
andrewhirst |
Posted: Thu Oct 06, 2011 7:04 am Post subject: Parsing Error using ISO-8859-1 in Linux |
|
|
 Apprentice
Joined: 06 Jul 2004 Posts: 33 Location: UK
|
Hi,
This might be a silly question. If it is I apologise in advance
Basically I have a message containing a '¿' as data. On a windows based development machine the message is sucessfully parsed. However on a Linux machine, a parser error occurs.
The XML file contains: <?xml version="1.0" encoding="ISO-8859-1"?>
Any ideas. I suspect that it relates to the CCSID or somesuch....
Development box is running WMQ6.0.0.1 & WMB 6.1.0.4
Linux box is running WMQ7.0.0.1 & WMB 6.1.0.4
TIA
Andrew _________________ Andrew Hirst |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Oct 06, 2011 7:14 am Post subject: Re: Parsing Error using ISO-8859-1 in Linux |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
andrewhirst wrote: |
Any ideas. |
Yes.
andrewhirst wrote: |
I suspect that it relates to the CCSID or somesuch.... |
 |
|
Back to top |
|
 |
Vitor |
Posted: Thu Oct 06, 2011 7:18 am Post subject: Re: Parsing Error using ISO-8859-1 in Linux |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
andrewhirst wrote: |
I have a message containing a '¿' as data. |
What is that actual value? It's actual hex? Is it in the list of valid XML characters as defined by W3C? Is it in a normal tag, CDATA or what?
andrewhirst wrote: |
The XML file contains: <?xml version="1.0" encoding="ISO-8859-1"?> |
So what does that encoding tell the parser about the message content?
andrewhirst wrote: |
I suspect that it relates to the CCSID or somesuch.... |
I suspect you're right. What character sets are being used on the Windows & Linux boxes, and how does that releate to what the XML claims it's encoded as?
And from a design standpoint, why have you got non-printable characters in an XML document anyway? They cause exactly this kind of portability problems. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
andrewhirst |
Posted: Fri Oct 07, 2011 2:37 am Post subject: |
|
|
 Apprentice
Joined: 06 Jul 2004 Posts: 33 Location: UK
|
Actual value in the incoming XML message is 0xBF or 191 decimal and it is a valid XML character, but not with UTF8 encoding.
The character is part of some descriptive text entered in a field in a table somewhere out of our control. It is probably that the character is added at the data entry point. It is in a normal tag by the time it gets to the broker.
I executed locale on the linux box and got this back:
Code: |
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
|
Quote: |
So what does that encoding tell the parser about the message content? |
That it should expect a certain character set. UTF 8 is basically ASCII. ISO-8859-1 covers western European languages, including the ¿
I've checked the CCSID on the XP box and the Linux box: 850 and 1280 respectively.
Quote: |
and how does that releate to what the XML claims it's encoded as? |
Well if the linux box is UTF8 then I guess that it won't understand. I presume that I need to change the locale info somehow? E.g.: export LANG=en_US.ISO-8859-1
Could I change the value on the QMgr in Linux? From 1280 to 819?
Quote: |
And from a design standpoint, why have you got non-printable characters in an XML document anyway? They cause exactly this kind of portability problems. |
The content of the tags is out of my control, besides, the ¿ is a valid character.
Thanks for the pointers - I have had to work to find the answers and that is good  _________________ Andrew Hirst |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Oct 07, 2011 4:33 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
The machine locale doesn't apply here, per se.
If the XML data says that it is in ISO-8859-1, then the container for the XML data (the message - if it's an MQ message (you've not stated!), then the CCSID) needs to match that.
UTF-8 is *NOT* basically ASCII.
UTF-8 is Unicode in a specific set of byte-encodings,and covers a significantly larger set of characters than ASCII, and almost certainly covers your ¿
You should confirm that you are sending your ISO-8859-1 data in a container that also indicates that the data is ISO-8859-1.
You should also strongly consider converting your XML data to actually be in UTF-8. At the sender. |
|
Back to top |
|
 |
Vitor |
Posted: Fri Oct 07, 2011 4:45 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
andrewhirst wrote: |
UTF 8 is basically ASCII. |
It's not basically anything. It's exactly something, and not ASCII. With this kind of problem the devil is in the details.
andrewhirst wrote: |
Could I change the value on the QMgr in Linux? From 1280 to 819? |
This would make you popular with anyone else using the queue manager relying on that setting, and wouldn't help.
A more important question is what is the CCSID on the inbound message, which is what is used to decode the message rather than the queue manager's CCSID? Is it 819, 1208, 1200 or some thing else?
andrewhirst wrote: |
The content of the tags is out of my control, besides, the ¿ is a valid character. |
Then bypass the whole message and have the sending application use Unicode.
andrewhirst wrote: |
Thanks for the pointers - I have had to work to find the answers and that is good |
You're welcome. This forum applies the principle of teaching people to fish rather than giving people fish. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sat Oct 08, 2011 4:41 pm Post subject: Re: Parsing Error using ISO-8859-1 in Linux |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
This looks like a broker issue. Perhaps a moderator could move this thread to the broker forum.
andrewhirst wrote: |
Basically I have a message containing a '¿' as data. On a windows based development machine the message is sucessfully parsed. However on a Linux machine, a parser error occurs. |
A usertrace would provide a more detailed error message.
andrewhirst wrote: |
Actual value in the incoming XML message is 0xBF or 191 decimal and it is a valid XML character ... |
No, 0xBF is a meaningless byte value. Bytes are not characters. Bytes have a physical representation, but no inherent meaning. Characters have meaning, but no inherent physical representation. A character encoding defines a mapping between the two.
The sender uses a character encoding (ccsid) to map a sequence of characters to a sequence of bytes, when writing a message. The receiver needs to use the same character encoding (ccsid) to map the sequence of bytes back to a sequence of characters, when reading the message; otherwise, the receiver ends up with a garbled message.
Normally, the sender provides the input ccsid as part of the input message, so that the receiver can interpret it correctly.
andrewhirst wrote: |
I've checked the CCSID on the XP box and the Linux box: 850 and 1280 respectively ... I presume that I need to change the locale info somehow? |
1280 is "Apple Greek" (a variant of ISO 8859-7). 1208 is "UTF-8". Typo?
After reading the input message, the message flow may choose to convert it, either to match the qmgr ccsid, locale, or some other character encoding; otherwise the qmgr ccsid and locale are irrelevant. |
|
Back to top |
|
 |
bruce2359 |
Posted: Sun Oct 09, 2011 5:26 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Moved to Message Broker forum. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Oct 09, 2011 7:51 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Quote: |
Could I change the value on the QMgr in Linux? From 1280 to 819?
|
Shouldn't that be 1208?
However, changing the QMGR CCSID won't fix your problem. All it will do is change the default CCSID for any newly created messages or messages converted upon receipt over a channel.
As has been said, you need to get the sender to actually send a proper message.
One with the proper CCSID and everything else that has been mentioned in this thread.
This is Kimberts 1st law. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|