Author |
Message
|
mglaiste |
Posted: Thu Apr 05, 2012 3:05 am Post subject: German Characters Problem |
|
|
Novice
Joined: 06 Jun 2003 Posts: 16 Location: TSE Port Talbot, Wales
|
The message flow receives XML encoded as UTF-8. Mesage tester is showing codepage 1208.
I am extracing the data from the XML and writing out as BLOB
outdata = CAST(indata AS BLOB CCSID 1208)
The umlauted characters are not failing but are not correctly displayed when the mesage is written to an MQSeries queue or when written to file by a FileOutput Node.
Any ideas since I thought CCSID 1208 would be OK |
|
Back to top |
|
 |
Esa |
Posted: Thu Apr 05, 2012 3:36 am Post subject: Re: German Characters Problem |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
mglaiste wrote: |
The umlauted characters are not failing but are not correctly displayed when the mesage is written to an MQSeries queue or when written to file by a FileOutput Node.
|
You must tell us how you tested it. With RFHUtil? Did you have GMO.convert turned off? Did you change OutputRoot.Properties.CodedCharSetId and OutputRoot.MQMD.CodedCharSetId to 1208? |
|
Back to top |
|
 |
saurabh867 |
Posted: Thu Apr 05, 2012 3:46 am Post subject: Re: German Characters Problem |
|
|
Voyager
Joined: 13 Jun 2010 Posts: 78
|
Also,
What is the application you are using to view the output messages? Some times the characters might be correct but the application with which you are trying to open may not show the correct characters. |
|
Back to top |
|
 |
mglaiste |
Posted: Thu Apr 05, 2012 4:13 am Post subject: |
|
|
Novice
Joined: 06 Jun 2003 Posts: 16 Location: TSE Port Talbot, Wales
|
SET OutputRoot.Properties.CodedCharSetId = 1208;
SET OutputRoot.MQMD.CodedCharSetId = 1208;
DECLARE outdata CHAR ' ';
SET outdata = InputRoot.XMLNS.Envelope.Body.Data.Row[1];
SET OutputRoot.BLOB.BLOB = CAST(outdata as BLOB CCSID 1208);
The original problem was reported by the user. The data comes from SAP. I use IBM rfhutil
<Row>test ö</Row>
becomes:-
test � |
|
Back to top |
|
 |
Vitor |
Posted: Thu Apr 05, 2012 4:40 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mglaiste wrote: |
The original problem was reported by the user. |
What are they using to view the data? Get them to tell you the actual hex values that are in the data.
mglaiste wrote: |
I use IBM rfhutil
<Row>test ö</Row>
becomes:-
test � |
AFAIK that uses the machine's code page.
With this sort of issue, the actual raw values in the message are key. Too many viewing tools try to help by converting data for you. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mglaiste |
Posted: Thu Apr 05, 2012 5:02 am Post subject: |
|
|
Novice
Joined: 06 Jun 2003 Posts: 16 Location: TSE Port Talbot, Wales
|
The data is FTP'd to a windows server. I downloaded the data and ö is ?? when viewed in Notepad++. WBI is sending as ASCII |
|
Back to top |
|
 |
saurabh867 |
Posted: Thu Apr 05, 2012 5:02 am Post subject: |
|
|
Voyager
Joined: 13 Jun 2010 Posts: 78
|
Try to use one of the HEX viewer. One of the many :HEX Editor.
This let you see the character as well as their HEX value. Compare the HEX value just before the MQOutput and the value that comes in these editors.
If there is a difference in the HEX value, there is some issue.
Otherwise, it's just the viewer misrepresenting the HEX.
Regards,
Saurabh |
|
Back to top |
|
 |
mglaiste |
Posted: Thu Apr 05, 2012 5:42 am Post subject: |
|
|
Novice
Joined: 06 Jun 2003 Posts: 16 Location: TSE Port Talbot, Wales
|
The message flow is writing ö as hex c3 b6 which is not ASCII |
|
Back to top |
|
 |
mapa |
Posted: Thu Apr 05, 2012 6:02 am Post subject: |
|
|
 Master
Joined: 09 Aug 2001 Posts: 257 Location: Malmö, Sweden
|
Why are you sending non-ascii data as ASCII then?
I would both upload and download the data as binary if using FTP in your case. (I actually never use the Fileoutput node in the broker so I am no expert on how the node behaves.)
c3 b6 is the correct hex for UTF-8 ö.
You first stated it was written to a queue, isn't it correct on the queue?
If not, what is the CCSID on message on the queue and what is the hex value.
One probable cause of failure in your code is that you are casting from 1208 instead of Properties.CodedCharSetId (could still work provided that it really is 1208 coming in). |
|
Back to top |
|
 |
kimbert |
Posted: Sat Apr 07, 2012 11:58 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I am extracing the data from the XML and writing out as BLOB |
That is an unusual thing to do in a message flow. Can you explain why you are doing this?
This line of code is very unlikely to be correct:
Code: |
outdata = CAST(indata AS BLOB CCSID 1208) |
I assume that indata is pointing at a CHARACTER field in the message tree. All CHARACTER data in the message tree is in UTF-16. Your CAST statement is casting from CHARACTER to BLOB so this will create a UTF-8 character stream in outdata containing the same characters as were in the message tree.
I completely agree with mapa: you need to *understand* your data and then *design* a solution to fit the data. At present you seem to be trying to *guess* the correct ccsid. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sat Apr 07, 2012 2:24 pm Post subject: Re: German Characters Problem |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
mglaiste wrote: |
The data is FTP'd to a windows server. I downloaded the data and ö is ?? when viewed in Notepad++. WBI is sending as ASCII |
FTP itself may be converting the data; is it transferring files in "ascii" mode or "binary" mode? Notepad++ may not be displaying the data correctly; make sure that it's configured for the correct encoding. Inspect the message at each point in the interface, (a tool that can also display data in hex is often useful), and try to narrow down where the corruption occurs.
mglaiste wrote: |
The original problem was reported by the user. The data comes from SAP. I use IBM rfhutil
<Row>test ö</Row>
becomes:-
test � |
rfhutil does not necessarily display non-ASCII characters correctly; check the hex values for the characters in question. X'EFBFBD' looks like the UTF-8 encoding for the Unicode REPLACEMENT CHARACTER (U+FFFD, '�'). This might indicate some data loss occurred before converting to UTF-8. |
|
Back to top |
|
 |
mqsiuser |
Posted: Sun Apr 08, 2012 12:45 am Post subject: Re: German Characters Problem |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
rekarm01 wrote: |
mglaiste wrote: |
The data is FTP'd to a windows server. I downloaded the data and ö is ?? when viewed in Notepad++. WBI is sending as ASCII |
FTP itself may be converting the data; is it transferring files in "ascii" mode or "binary" mode? Notepad++ may not be displaying the data correctly; make sure that it's configured for the correct encoding. |
Could it be that you are using "text"("ascii") mode... try to switch to "binary" mode (in FTP). Binary mode will just not do any binary/bit-changes, while "text" will probably do (I only know of unix to windows-line break change, but probably it also changes your "ö")... this can be handy, but with broker, let him do the code-page handling (&conversion)!
"??" means (imho) that you already lost the information about your special character... there will be no clue whether it was "ä", "ö" or "ü"... e.g. the WBI-Adapter just put in "??" (the 2 bytes for question mark in ASCII), because it couldn't convert (the UTF-8 special character (consisting of two bytes) "ö") into the special character "ö" of the target code-page.
rekarm01 wrote: |
Inspect the message at each point in the interface, (a tool that can also display data in hex is often useful), and try to narrow down where the corruption occurs. |
Thats your only choice to find out where the problem is, have fun :-)... you also have to interpret what you see at each point correctly :-) and use a couple (and the proper) tools to look at the message. Probably you should note down the bits of the characters in question, e.g. "01010101", the corresponding number, e.g. 65... and look at (all relevant) code page tables :-)
rekarm01 wrote: |
mglaiste wrote: |
The original problem was reported by the user. The data comes from SAP. I use IBM rfhutil
<Row>test ö</Row>
becomes:-
test � |
rfhutil does not necessarily display non-ASCII characters correctly; check the hex values for the characters in question. X'EFBFBD' looks like the UTF-8 encoding for the Unicode REPLACEMENT CHARACTER (U+FFFD, '�'). This might indicate some data loss occurred before converting to UTF-8. |
RFHUtil is a great choice! I have the impression, that RFH-Util always displays ASCII (in the message payload tab) ... which is interesting to look at, since it also unveils your bits (like hex does... but hex does so more explicitly!). _________________ Just use REFERENCEs |
|
Back to top |
|
 |
mglaiste |
Posted: Tue Apr 10, 2012 2:35 am Post subject: |
|
|
Novice
Joined: 06 Jun 2003 Posts: 16 Location: TSE Port Talbot, Wales
|
I have changed the FTP to binary. The purpose of the message flow is to extract tge data payload from the XML and create a text file for use with another application.
Rfhutil displays the xml header a UTF-8 and the CCSID as 1208.
I use the Reset Contents descripter node to reset the domain to XMLNS so that I can reference the payload (Row):-
Body.Data.Row
I have set the outgoing MQMD thus:-
SET OutputRoot.MQMD.CodedCharSetId = 1208;
I declare a variable outdata as char then
outdata = ....Row[I]
Then I cast as BLOB and send to file output node
Set OutputRoot.BLOB.BLOB = CAST(outdata as BLOB CCSID 1208)
After the domain reset is the mesage now UTF-16 so should I cast to CCSID 1200?
Thanks all in advance |
|
Back to top |
|
 |
kimbert |
Posted: Tue Apr 10, 2012 2:53 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I use the Reset Contents descripter node to reset the domain to XMLNS |
Did you mean 'XMLNSC'? I cannot think of a good reason to use XMLNS in this situation. |
|
Back to top |
|
 |
mglaiste |
Posted: Tue Apr 10, 2012 2:57 am Post subject: |
|
|
Novice
Joined: 06 Jun 2003 Posts: 16 Location: TSE Port Talbot, Wales
|
I use XMLNS (I don't want to validate the XML just be able to reference the payload:-
SET outdata = InputRoot.XMLNS.Envelope.Body.Data.Row[I]; |
|
Back to top |
|
 |
|