Author |
Message
|
steve_metal |
Posted: Thu Jun 30, 2011 12:06 am Post subject: Parsing € in an XML |
|
|
 Novice
Joined: 22 Sep 2009 Posts: 17
|
Hi ,
Require help in parsing the euro symbol (hex 20ac) value in the XML .
The problem faced is in the publication node .
I believe someone would have faced it and would be pretty straightforward, hence not providing lengthy details .
If requested I can share a lot more.
Thanks and Regards
Steve |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jun 30, 2011 1:30 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
The publication node shouldn't be parsing anything, it should at best be serializing things.
If you are not seeing output character data that matches what you expect from the logical message tree, it is because you did not specify the right CCSID. |
|
Back to top |
|
 |
Vanshul_MB |
Posted: Thu Jun 30, 2011 2:11 am Post subject: |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 68
|
Try CCSID 1208 and Encoding 546 |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Jun 30, 2011 2:40 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Quote: |
Try CCSID 1208 and Encoding 546
|
That won't help if the data was incorrectly sent in the first place.
For example using ISO-8859-1(ccsid=817) when you should be using ISO-8859-15 (ccsid=923)
8859-19 has formal support for the Euro wheras 8859-1 does not.
Start rant
Problems of this type could be easily avoided by insisting upon a common and generic CCSID is used throuought. UTF-8 is a good starting point. I've almost lost count of the PHB's who have said things along the lines of 'detail, don't worry me with the details. Just fix it yourself.
end rant
As an aside, it might be nice if IBM changed the default CCSID for WMQ to 1208 on all Mid Range platforms. You might be surprised how many sites just use their OOTB default. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
rekarm01 |
Posted: Thu Jun 30, 2011 10:44 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Please watch out for typos, particularly for numbers.
smdavies99 wrote: |
For example using ISO-8859-1(ccsid=817) when you should be using ISO-8859-15 (ccsid=923)
8859-19 has formal support for the Euro wheras 8859-1 does not. |
should be:
Quote: |
For example using ISO-8859-1 (ccsid=819) when you should be using ISO-8859-15 (ccsid=923)
8859-15 has formal support for the Euro whereas 8859-1 does not. |
|
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Jun 30, 2011 11:26 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Sigh I always seem to get 817 & 819 mixed up.
It was an example though. (well that's my excuse.)
All I can hope is that the OP gets what I ws on about. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
steve_metal |
Posted: Mon Jul 04, 2011 10:47 pm Post subject: |
|
|
 Novice
Joined: 22 Sep 2009 Posts: 17
|
Thanks guys , issue is almost sorted out .
Euro symbol is a valid symbol in UTF-8 (1208 CCSID) as well . Except that the representation wont be € and will be something like (â <82> ¬ ) instead(still a valid representation of Euro in UTF- .
The problem we had was because of the RFH header being the immediate parser of the output body. The RFH header somewhere acquired the CCSID 819 and it tries to read character using CCSID 819 when it is encoded using the CCSID 1208 (I believe this happens only when the message is parsed as such , otherwise it would have left it alone).
Im not sure this makes sense , but this is what is happening and seems to be working . |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jul 05, 2011 12:00 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Euro symbol is a valid symbol in UTF-8 (1208 CCSID) as well . |
I knew that already, and you should have known too. Read this to find out why: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
If you are working as an integration developer then you really must understand the basics about Unicode and character encodings.
Quote: |
Except that the representation wont be € and will be something like (â <82> ¬ ) instead |
That statement is only true for readers who are viewing the data in the same way as you. A viewer that knows about UTF-8 ( and knows that it is displaying UTF-8 ) will display it correctly. |
|
Back to top |
|
 |
steve_metal |
Posted: Tue Jul 05, 2011 12:15 am Post subject: |
|
|
 Novice
Joined: 22 Sep 2009 Posts: 17
|
Hi Kimbert ,
That was a nice read . The fact was known to me , I just wanted to put it out there. The confusing part is (more technically) why is the publication node throwing the error and not any other node which parses the message. And Im not even sure if the publication node actually parses an XML .
The next question on the same perspective of your note. What happens to applications (legacy applications) that do not actually consider the encoding (no matter ISO-8859-1 or UTF-8 - both may have different representations) , it just interprets it as character and the data is stored differently .
This is a current issue we are having with one of the systems (with the same Euro symbol) , where we send in the data in UTF-8 , but the application does not bother to interpret it in the encoding that we sent it to it in . Moreover if we actually start using ISO-8859-15 which has the actual euro symbol representation , what other character sets are not supported in this format .
In general which is the best encoding/CCSID/code page to use when dealing with characters across the globe. (say Im just publishing data not to any specific consumer).
A long sought question , Appreciate the responses.
Thanks guys. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jul 05, 2011 1:38 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I'll answer the most important question first.
Quote: |
In general which is the best encoding/CCSID/code page to use when dealing with characters across the globe. |
One of the Unicode encodings. I do understand that there are environments/organisations out there where it is hard to *change* from some other backbone encoding. But to deliberately choose a non-Unicode encoding when Unicode is a viable option is frankly insane.
Any of the Unicode encodings will do the job; UTF-8. UTF-16 or if you really want to, UTF-32. But your backbone should always use a Unicode encoding if possible.
Quote: |
The next question on the same perspective of your note. What happens to applications (legacy applications) that do not actually consider the encoding (no matter ISO-8859-1 or UTF-8 - both may have different representations) , it just interprets it as character and the data is stored differently .
This is a current issue we are having with one of the systems (with the same Euro symbol) , where we send in the data in UTF-8 , but the application does not bother to interpret it in the encoding that we sent it to it in . |
The downstream application is badly-behaved. If it happens to work for you today, consider yourselves lucky.
Quote: |
Moreover if we actually start using ISO-8859-15 which has the actual euro symbol representation , what other character sets are not supported in this format . |
Setting the receiver's encoding to ISO-8859-15 will not necessarily make the UTF-8 Euro symbol get interpreted correctly. You *might* be lucky, and find that the bytes for the Euro symbol are the same for ISO-8859-15 and UTF-8. But that won't be true for other characters that are shared between the two encodings. And there will be UTF-8 characters that are not supported at all in ISO-8859-15. See my first answer.
In case it's not clear enough already, I would not advise you to choose your backbone encoding based on whether a test message containing the Euro symbol happens to work. |
|
Back to top |
|
 |
steve_metal |
Posted: Tue Jul 05, 2011 9:48 pm Post subject: |
|
|
 Novice
Joined: 22 Sep 2009 Posts: 17
|
Thanks again ... Got the point now...
Related to the same issue .. we managed to pass the euro through ESB layer.. the problem we have is
We have two brokers and a gateway in a cluster (everything enters or leaves the brokers throught the gateway qmgr). The cluster reciever channels on all the qmgrs have the convert option set to a 'yes' and the qmgr ccsid is 819
So if we send the MQMD.Format as 'MQSTR' there is a conversion that happens(converts data to be represented in 819) and the target legacy application is not able to decipher the representation of euro symbol. If we set the format as MQFMT_NONE , the conversion does not happen ; but the legacy application is expecting MQSTR in the format field (a custom z/os adapter to pick of a queue).
Any way we could bypass the conversion even with MQSTR set ? |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jul 06, 2011 12:26 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
steve_metal wrote: |
The cluster reciever channels on all the qmgrs have the convert option set to a 'yes' and the qmgr ccsid is 819
Any way we could bypass the conversion even with MQSTR set ? |
Absolutely: DO NOT SET CONVERSION=YES ON CLUSTER CHANNELS
Especially with a WMB qmgr at the other end. It defeats all the purpose. And it makes it impossible to ensure a decent message (including parsing and rendering) at the broker.
Example: - The source sends a correctly formatted UTF-8 message to the broker, containing the Euro char (not present in CCSID 819).
- Channel conversion substitutes a char for the Euro sign. However the substitution char is not part of accepted XML charset @ CCSID 819...
- Result: The broker cannot parse the message!
- The broker sends out a perfectly formatted UTF-8 message with Euro sign
- The channel translates to CCSID 819 and uses substitution char for Euro sign
- Result: The recipient cannot parse the message
You really want conversion turned off on all cluster channels. If you need conversion turned on use a remote queue def with it's own channel def.
For the same reason you want to make sure that conversion is not turned on, on any of your MQInput nodes.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Jul 06, 2011 12:28 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
steve_metal wrote: |
We have two brokers and a gateway in a cluster (everything enters or leaves the brokers throught the gateway qmgr). The cluster reciever channels on all the qmgrs have the convert option set to a 'yes' and the qmgr ccsid is 819
|
Oh dear - up the Creek with no paddle then?
If you can't sort that out then anything you do from here on in is a bodge. Held together with Gaffer Tape.
There must have been at some point in time a perfectly good reason for setting the channel that way.
If that reason still holds, can you get another channel setup that does not do the conversion and send these messages over that? This may work provided the CCSID of these is set correctly. If the messages containing the Euro symbol don't have the right ccsid then you are well and truly heading for that iceberg.
There are ways round this but it is going to be a big fat hack/bodge. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
steve_metal |
Posted: Wed Jul 06, 2011 3:34 am Post subject: |
|
|
 Novice
Joined: 22 Sep 2009 Posts: 17
|
Thanks guys ...
Nobody is able to actually get the reason for the conversion to be on.But no one wants to switch it off either believing it to be put there for some reason and since there are plenty of projects running on this same topology , consequence testing and regression impact becomes huge.
So going for the only solution of creating a new (kind of like duplicate) cluster with the clster channels not having conversion. Since it is pretty urgent (with this euro symbol constantly falling over)) proceesing over with that approach unless some other ideas are there . |
|
Back to top |
|
 |
|