Author |
Message
|
mhd_zabi |
Posted: Sun Sep 25, 2016 8:37 am Post subject: Special characters getting converted to different characters |
|
|
Newbie
Joined: 25 Sep 2016 Posts: 7 Location: Mangalore
|
Hi All,
I am using IIB9 on AIX7.1. I am facing an issue while reading a csv file.
The flow doesnt do much. It reads the input csv file record by record by skipping the header record and then appends each record to the output file. Basically the flow just removed the header record from the input and creates the output with the header record removed.
Now the problem comes with the special characters. Characters like ä and ö are getting converted to ä and ö respectively. I checked the input file and it seems to be a UTF-8 encoded file(I checked by opening this file in Notepad++). But the output files with the changed characters seem be ANSI encoded.
Can anyone suggest how this is getting changed.
Thanks. |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Sep 25, 2016 11:27 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
What is the CCSID on OutputRoot.Properties when writing the file?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
mhd_zabi |
Posted: Sun Sep 25, 2016 8:32 pm Post subject: |
|
|
Newbie
Joined: 25 Sep 2016 Posts: 7 Location: Mangalore
|
CCSID is 1208 and Encoding is 273 in both input and output properties.
Strange thing is that yesterday when i tried to read the input as Whole File, and just copied it to the output, the special characters were getting passed as is. Its only when i try to transform the file, that the special characters are getting changed. |
|
Back to top |
|
 |
adubya |
Posted: Sun Sep 25, 2016 11:54 pm Post subject: |
|
|
Partisan
Joined: 25 Aug 2011 Posts: 377 Location: GU12, UK
|
How are you performing the transformation ? Java/ESQL/built in node ?
If using Java then check any methods which handle byte arrays are using the correct encoding parameters. _________________ Independent Middleware Consultant
andy@knownentity.com |
|
Back to top |
|
 |
mhd_zabi |
Posted: Mon Sep 26, 2016 2:38 am Post subject: |
|
|
Newbie
Joined: 25 Sep 2016 Posts: 7 Location: Mangalore
|
I am using ESQL, but all it does is
Call CopyMessageHeaders
Call CopyEntireMessage
Set the output filename
check whether is the IsEmpty flag is TRUE or FALSE based on which i send the data to finish file terminal to complete file.
The transformation of removing the header record from input is being done by the Records and Elements property in the FileInput node where I am checking the box to skip first record. |
|
Back to top |
|
 |
smdavies99 |
Posted: Mon Sep 26, 2016 2:51 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
This:-
mhd_zabi wrote: |
Call CopyMessageHeaders
Call CopyEntireMessage
Set the output filename
|
is incorrect.
You do ONE or the other of
Call CopyMessageHeaders
Call CopyEntireMessage
No both.
In your case, as you are not doing anything to the message body then
Code: |
Call CopyEntireMessage
|
Is the right one to use. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
timber |
Posted: Mon Sep 26, 2016 2:53 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
Strange thing is that yesterday when i tried to read the input as Whole File, and just copied it to the output, the special characters were getting passed as is. Its only when i try to transform the file, that the special characters are getting changed. |
That's not strange at all. If the flow does not change the message tree then the output will be copied from the input bitstream.
Quote: |
I checked the input file and it seems to be a UTF-8 encoded file(I checked by opening this file in Notepad++). But the output files with the changed characters seem be ANSI encoded. |
Seems to be? On what basis are you asserting this? Most character encodings will look reasonably OK when interpreted as UTF-8 when you are viewing mostly ASCII characters. So the fact that Notepad++ displayed the file correctly doesn't prove very much.
You cannot make assumptions about the character encoding. You absolutely must find out what encoding the sender used when the file was written. Then specify that encoding in your message flow. |
|
Back to top |
|
 |
rekarm01 |
Posted: Mon Sep 26, 2016 8:23 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
mhd_zabi wrote: |
CCSID is 1208 and Encoding is 273 in both input and output properties. |
There is also the possibility that the message flow works, but something is wrong with whatever application interprets/renders/displays the output message. For example, if NotePad++ requires a BOM in order to correctly detect UTF-8, but the message flow does not preserve the BOM when transforming the message, then NotePad++ would garble the message. Applications such as rfhutil and amqsbcg0 that can display message bytes in hexadecimal are much more useful for unambiguously examining messages. |
|
Back to top |
|
 |
mhd_zabi |
Posted: Mon Sep 26, 2016 11:27 pm Post subject: |
|
|
Newbie
Joined: 25 Sep 2016 Posts: 7 Location: Mangalore
|
Thank you all for your inputs, specially rekarm01
I explored some more based on what rekarm01 said and below is what I found.
The input we recieved from the source was a UTF-8 with BOM, as i could see the BOM hex code in the input. The output file that i generated, did not have this. Now Notepad++ still detects this without the BOM and shows encoding as "UTF-8 Without BOM", but excel and our end application (which in this case is Siebel) is not able to detect the encoding as UTF-8 in which case it takes it as ASCII and corrupts the special characters. The BOM seems to be getting removed when i remove the header record from the input.
I was wondering whether there is some way to retain this BOM even after removing the header record. |
|
Back to top |
|
 |
timber |
Posted: Tue Sep 27, 2016 1:49 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
The BOM is just a sequence of 3 bytes. You could write those 3 bytes to the file before appending the other records.
Alternatively, you could un-select the 'Skip first record' and put in some logic to truncate all except the first character of the first record ( rather than skipping it completely ).
Before you do either of those, you may want to ask yourself whether the BOM is actually required by the receiver. |
|
Back to top |
|
 |
mhd_zabi |
Posted: Thu Sep 29, 2016 1:46 am Post subject: |
|
|
Newbie
Joined: 25 Sep 2016 Posts: 7 Location: Mangalore
|
Thanks Timber
I was able to append the BOM to the beginning of the file by converting it to Bitsream and it worked fine. The end application was able to read the file as UTF-8 and processed the characters correctly.
Thank you all for you help. |
|
Back to top |
|
 |
|