Author |
Message
|
fszostak |
Posted: Thu Feb 17, 2011 8:58 am Post subject: UTF-16 to UTF-8 conversion |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 64 Location: Curitiba, Brazil
|
Hi All!
I have a XML file content with CCSID 1200 (UTF-16), need read it and convert to CCSID 1208 (UTF-8) to use data on flow execution.
I try ESQL CAST but dont work.
Code: |
SET OutputRoot.Properties.CodedCharSetId = 1208;
SET OutputRoot.BLOB.BLOB = CAST(InputRoot.BLOB.BLOB AS BLOB CCSID 1208);
|
Any idea?
Thanks
Szostak _________________ WMB 6.1.005 |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Feb 17, 2011 9:25 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Why do you need to change the codepage of the bitstream before you parse the data?
Broker can parse XML data that's in 1200 just as easily as in 1208 or any other reasonable codepage... |
|
Back to top |
|
 |
Vitor |
Posted: Thu Feb 17, 2011 9:35 am Post subject: Re: UTF-16 to UTF-8 conversion |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
fszostak wrote: |
I have a XML file content with CCSID 1200 (UTF-16), need read it and convert to CCSID 1208 (UTF- to use data on flow execution. |
Why? Broker uses UTF-16 internally and implicitly converts so provided the input is properly described as CCSID 1200 you should need to take no action. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fszostak |
Posted: Thu Feb 17, 2011 9:55 am Post subject: |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 64 Location: Curitiba, Brazil
|
mqjeff wrote: |
Why do you need to change the codepage of the bitstream before you parse the data?
Broker can parse XML data that's in 1200 just as easily as in 1208 or any other reasonable codepage... |
Because I need read XML data with codepage 1200 and write another file in 1208. Native codepage is 1208, i receive external data in 1200.
I may be wrong, but I think I need to convert the data flow during execution. _________________ WMB 6.1.005 |
|
Back to top |
|
 |
fszostak |
Posted: Thu Feb 17, 2011 10:04 am Post subject: Re: UTF-16 to UTF-8 conversion |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 64 Location: Curitiba, Brazil
|
Vitor wrote: |
fszostak wrote: |
I have a XML file content with CCSID 1200 (UTF-16), need read it and convert to CCSID 1208 (UTF- to use data on flow execution. |
Why? Broker uses UTF-16 internally and implicitly converts so provided the input is properly described as CCSID 1200 you should need to take no action. |
I use FileInput node, in "Input Message Parsing -> Message Coded charater set ID" property value is "1200".
When i receive message... the BLOB have a UTF-16 message, i need parse to XMLNSC... then error occur "XML Parsing Errors have occurred".
Whats wrong? _________________ WMB 6.1.005 |
|
Back to top |
|
 |
Vitor |
Posted: Thu Feb 17, 2011 10:13 am Post subject: Re: UTF-16 to UTF-8 conversion |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
fszostak wrote: |
Whats wrong? |
Well a number of things.
This is the first time you've mentioned parsing the XML as well as converting the code page. If the CCSID is correctly specified on the input, you don't need to convert anything; just parse it into the XMLNSC domain and do what you want. Likewise if you want the output tree written in a specific code page then set the CCSID accordingly, you still don't need to convert anything manually.
If you're getting parsing errors from the input, the first and simplest check is to open the input with IE,XMLSpy or similar to check the validity. Code page should be irrelevant. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Feb 17, 2011 10:19 am Post subject: Re: UTF-16 to UTF-8 conversion |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
fszostak wrote: |
Vitor wrote: |
fszostak wrote: |
I have a XML file content with CCSID 1200 (UTF-16), need read it and convert to CCSID 1208 (UTF- to use data on flow execution. |
Why? Broker uses UTF-16 internally and implicitly converts so provided the input is properly described as CCSID 1200 you should need to take no action. |
I use FileInput node, in "Input Message Parsing -> Message Coded charater set ID" property value is "1200".
When i receive message... the BLOB have a UTF-16 message, i need parse to XMLNSC... then error occur "XML Parsing Errors have occurred".
Whats wrong? |
As my esteemable colleague says, you have this configured halfway correctly.
You need to set the FileInput node to indicate that it should parse the data using the XMLNSC parser rather than the BLOB parser.
Then when you need to write the data out, you will set the OutputRoot.Properties.CodedCharSet ID to 1208 and any relevant options on the Output node, and the data will be serialized to 1208 when the XMLNSC tree is unmarshalled. |
|
Back to top |
|
 |
fszostak |
Posted: Thu Feb 17, 2011 10:44 am Post subject: Re: UTF-16 to UTF-8 conversion |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 64 Location: Curitiba, Brazil
|
Sorry for the confusion! I start with XMLNSC, after i change to BLOB to try solve the XML parsing error.
I try again set XMLNSC parser in FileInput, but get error "XML Parsing Errors have occurred".
I use java program to convert and XML data is okay! but in WMB not okay!  _________________ WMB 6.1.005 |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Feb 17, 2011 10:47 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
What's the parsing error?
You would need to CAST the BLOB to CHAR and then CAST that CHAR to BLOB to change the CCSID of the BLOB, if you really felt this had to be done to resolve your issue. |
|
Back to top |
|
 |
fszostak |
Posted: Thu Feb 17, 2011 10:52 am Post subject: |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 64 Location: Curitiba, Brazil
|
When I set BLOB parser in FileInput, the follow message is received:
Code: |
003c003f0078006d006c002000760065007200730069006f006e003d.............
|
Its a UTF-16 message.
I change de CharSet to 1208
Code: |
SET OutputRoot.Properties.CodedCharSetId = 1208;
SET OutputRoot.BLOB.BLOB = CAST(CAST(InputRoot.BLOB.BLOB AS CHARACTER CCSID 1208 ENCODING InputRoot.Properties.Encoding) AS BLOB CCSID 1208 ENCODING InputRoot.Properties.Encoding);
|
But output message keep:
Code: |
003c003f0078006d006c002000760065007200730069006f006e003d.............
|
_________________ WMB 6.1.005 |
|
Back to top |
|
 |
fszostak |
Posted: Thu Feb 17, 2011 10:55 am Post subject: |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 64 Location: Curitiba, Brazil
|
mqjeff wrote: |
You would need to CAST the BLOB to CHAR and then CAST that CHAR to BLOB to change the CCSID of the BLOB, if you really felt this had to be done to resolve your issue. |
I already try:
SET OutputRoot.BLOB.BLOB = CAST(CAST(InputRoot.BLOB.BLOB AS CHARACTER CCSID 1208 ENCODING InputRoot.Properties.Encoding) AS BLOB CCSID 1208 ENCODING InputRoot.Properties.Encoding); _________________ WMB 6.1.005 |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Feb 17, 2011 10:58 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
And is the resulting bitstream fundamentally different at the byte level than the same operation "that worked" in Java?
Are you sure you know which one was "correct"? |
|
Back to top |
|
 |
Vitor |
Posted: Thu Feb 17, 2011 11:14 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
fszostak wrote: |
I already try:
SET OutputRoot.BLOB.BLOB = CAST(CAST(InputRoot.BLOB.BLOB AS CHARACTER CCSID 1208 ENCODING InputRoot.Properties.Encoding) AS BLOB CCSID 1208 ENCODING InputRoot.Properties.Encoding); |
To repeat the unanswered question:
What was the parsing error?
To repeat the implied question:
What led you to believe the parsing error was related to code page?
When you say:
fszostak wrote: |
I use java program to convert and XML data is okay! |
Do you mean that you used a Java program to change the code page and WMB was able to parse it OR that the Java program was able to parse the XML?
If the former, then how exactly did the Java change the code page? If it deserialized the document and reserialized in a different code page then that's not the same XML document.
Put the original XML in IE or XMLSpy and display it. If either of those parses it then we'll talk code page. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fszostak |
Posted: Thu Feb 17, 2011 12:41 pm Post subject: |
|
|
Acolyte
Joined: 09 Feb 2011 Posts: 64 Location: Curitiba, Brazil
|
Vitor wrote: |
fszostak wrote: |
I already try:
What was the parsing error?
. |
|
Only receive on debug -> CHARACTER:XML Parsing Errors have occurred _________________ WMB 6.1.005 |
|
Back to top |
|
 |
Vitor |
Posted: Thu Feb 17, 2011 12:44 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
fszostak wrote: |
Only receive on debug -> CHARACTER:XML Parsing Errors have occurred |
There should be a lot more information than that in the preceeding errors.
Old advice but good advice - take a user trace rather than use the debugger. There's much more information available in the trace file than the debugger.
Have you loaded the XML into an external tool as I suggested? Does it parse correctly? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|