Author |
Message
|
Gemz |
Posted: Tue Mar 11, 2008 8:09 am Post subject: Unicode Parser Exception |
|
|
 Centurion
Joined: 14 Jan 2008 Posts: 124
|
Hi,
I am using XMLNSC parser.
In my input message i am getting some unicode characters and when the message enters into MB it is throwing "An invalid XML character (Unicode: 0xffffffff) was found in the element content of the document." exception.
I want to send this unicode character as it is to end system
Is there any way to parse this unicode character.
Thanks |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Mar 11, 2008 8:26 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
All legal unicode characters are not necessarily legal XML characters.
It seems you are trying to pass a mix of double-byte and single-byte Unicode characters in your XML document.
You likely need to examine your XML document, and adjust it, so that it a) is compliant with the XML standard, b) properly identifies the correct encoding for the XML document. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Gemz |
Posted: Tue Mar 11, 2008 9:20 am Post subject: |
|
|
 Centurion
Joined: 14 Jan 2008 Posts: 124
|
Hi,
I am using values like '£' & ' –'. These values are valid against schema. But Message Broker is not able to parse these values.
I am using CodedCharSetId as 1208.
How to change this parsing options?
Thanks |
|
Back to top |
|
 |
kimbert |
Posted: Tue Mar 11, 2008 3:03 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
You can be 99% sure that if the XMLNSC parser rejects a document, the document is not well-formed XML. Does the document parse without errors when you use other XML parsers? ( you could try opening it in Internet Explorer for a quick and easy test ). |
|
Back to top |
|
 |
Gemz |
Posted: Wed Mar 12, 2008 2:42 am Post subject: |
|
|
 Centurion
Joined: 14 Jan 2008 Posts: 124
|
Hi,
I opened the input xml in IE and also i checked the xml with XMLSpy also.
I am using IBM Message Broker V6.1.
When i use File input node, WMB is parsing the xml using CCSID 850. In the input file if i have any unicode, (for eg. '£') it is parsing this as some junk character(£ is parsing as ú) and process the xml.
When i use HTTP input node WMB is parsing the xml using CCSID 1208. Now if i have any unicode character, WMB is not able to parse the message. It is throwing "An invalid XML character (Unicode: 0xffffffff) was found in the element content of the document." exception.
How can we resolve parsing issue? |
|
Back to top |
|
 |
mgk |
Posted: Wed Mar 12, 2008 2:45 am Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
Can you paste the message here? _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
Gemz |
Posted: Wed Mar 12, 2008 3:01 am Post subject: |
|
|
 Centurion
Joined: 14 Jan 2008 Posts: 124
|
Input message looks like
<Message>
<Header>
<headerID>HeaderID</headerID>
<createdBy>created_user</createdBy>
<createdDateTime>2001-12-17T09:30:47.0Z</createdDateTime>
<records>1</records>
</Header>
<payload>
<activityID>2147483647</activityID>
<status>4</status>
<amount>£ 44</amount> //Here WMB is not able to parse this £.
<pointValue1>3.141593E0</pointValue1>
<pointValue2>3.141593E0</pointValue2>
</payload>
</Message>
Instead of the £ symbol, we have some other unicode like '–' (This is not hyphen) in which we face the same parsing problem. |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Mar 12, 2008 3:05 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
What does the <?xml> declaration say on this message? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Gemz |
Posted: Wed Mar 12, 2008 3:33 am Post subject: |
|
|
 Centurion
Joined: 14 Jan 2008 Posts: 124
|
jefflowrey wrote: |
What does the <?xml> declaration say on this message? |
In both cases i.e when we declare <?xml version="1.0" encoding="UTF-8"?> or NOT, we get the same problem.
Note: When we use HTTP Input node(CCSID 1208) we get parser exception and some junk character if we use File Input Node(CCSID850). |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Mar 12, 2008 3:44 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Is the data actually in UTF-8?
Is the pound sign legal in UTF-8?
Is the data in the file actually in CCSID 850?
Did you try searching here for other incidents of similar problems passing '£' through Broker? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Mar 12, 2008 3:45 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
There's a very important question which nobody has asked yet: which code page should the parser be using? You should be able to determine this by asking the provider of the message.
Until you know the answer, I would advise you not to experiment with various code pages. Otherwise you might get temporary success with this message, and then fall over in production when a different message arrives. |
|
Back to top |
|
 |
Gemz |
Posted: Wed Mar 12, 2008 4:39 am Post subject: |
|
|
 Centurion
Joined: 14 Jan 2008 Posts: 124
|
jefflowrey wrote: |
Is the data actually in UTF-8?
Is the pound sign legal in UTF-8?
Is the data in the file actually in CCSID 850?
Did you try searching here for other incidents of similar problems passing '£' through Broker? |
jeff,
Yes the data is in UTF-8 and the pound sign is a legal UTF-8 char.
In the file input node also i am using the same file which i use for http input node. In the input xml if i give '£' it is parsing as '£' in HTTP input node in MB.
kimb,
You are absolutely correct.
For the file input we are getting the message from a .NET service. I think in .NET they are not using any such code page. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Mar 12, 2008 2:48 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Is the Currency sign legal for the field?
Was the field supposed to be an amount field? You'd have a hard time parsing an amount field with the currency symbol. You should separate the 2 fields. One representing the amount the other the currency...
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Gemz |
Posted: Thu Mar 13, 2008 2:50 am Post subject: |
|
|
 Centurion
Joined: 14 Jan 2008 Posts: 124
|
Quote: |
You'd have a hard time parsing an amount field with the currency symbol |
yes. you are correct. here for example i have mentioned like that. The actual message contain it in different fields. |
|
Back to top |
|
 |
|