MQSeries.net :: View topic - Special characters getting converted to different characters

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Special characters getting converted to different characters

Special characters getting converted to different characters

« View previous topic :: View next topic »

Author

Message

mhd_zabi

Posted: Sun Sep 25, 2016 8:37 am Post subject: Special characters getting converted to different characters

Newbie

Joined: 25 Sep 2016
Posts: 7
Location: Mangalore

Hi All,
I am using IIB9 on AIX7.1. I am facing an issue while reading a csv file.
The flow doesnt do much. It reads the input csv file record by record by skipping the header record and then appends each record to the output file. Basically the flow just removed the header record from the input and creates the output with the header record removed.
Now the problem comes with the special characters. Characters like Ã¤ and Ã¶ are getting converted to ÃƒÂ¤ and ÃƒÂ¶ respectively. I checked the input file and it seems to be a UTF-8 encoded file(I checked by opening this file in Notepad++). But the output files with the changed characters seem be ANSI encoded.

Can anyone suggest how this is getting changed.

Thanks.

fjb_saper

Posted: Sun Sep 25, 2016 11:27 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

What is the CCSID on OutputRoot.Properties when writing the file?

_________________
MQ & Broker admin

mhd_zabi

Posted: Sun Sep 25, 2016 8:32 pm Post subject:

Newbie

Joined: 25 Sep 2016
Posts: 7
Location: Mangalore

CCSID is 1208 and Encoding is 273 in both input and output properties.

Strange thing is that yesterday when i tried to read the input as Whole File, and just copied it to the output, the special characters were getting passed as is. Its only when i try to transform the file, that the special characters are getting changed.

adubya

Posted: Sun Sep 25, 2016 11:54 pm Post subject:

Partisan

Joined: 25 Aug 2011
Posts: 377
Location: GU12, UK

How are you performing the transformation ? Java/ESQL/built in node ?

If using Java then check any methods which handle byte arrays are using the correct encoding parameters.
_________________
Independent Middleware Consultant
andy@knownentity.com

mhd_zabi

Posted: Mon Sep 26, 2016 2:38 am Post subject:

Newbie

Joined: 25 Sep 2016
Posts: 7
Location: Mangalore

I am using ESQL, but all it does is
Call CopyMessageHeaders
Call CopyEntireMessage
Set the output filename
check whether is the IsEmpty flag is TRUE or FALSE based on which i send the data to finish file terminal to complete file.

The transformation of removing the header record from input is being done by the Records and Elements property in the FileInput node where I am checking the box to skip first record.

smdavies99

Posted: Mon Sep 26, 2016 2:51 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

This:-

mhd_zabi wrote:

Call CopyMessageHeaders
Call CopyEntireMessage
Set the output filename

is incorrect.

You do ONE or the other of

Call CopyMessageHeaders
Call CopyEntireMessage

No both.
In your case, as you are not doing anything to the message body then

Code:

Call CopyEntireMessage

Is the right one to use.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

timber

Posted: Mon Sep 26, 2016 2:53 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

Strange thing is that yesterday when i tried to read the input as Whole File, and just copied it to the output, the special characters were getting passed as is. Its only when i try to transform the file, that the special characters are getting changed.

That's not strange at all. If the flow does not change the message tree then the output will be copied from the input bitstream.

Quote:

I checked the input file and it seems to be a UTF-8 encoded file(I checked by opening this file in Notepad++). But the output files with the changed characters seem be ANSI encoded.

Seems to be? On what basis are you asserting this? Most character encodings will look reasonably OK when interpreted as UTF-8 when you are viewing mostly ASCII characters. So the fact that Notepad++ displayed the file correctly doesn't prove very much.

You cannot make assumptions about the character encoding. You absolutely must find out what encoding the sender used when the file was written. Then specify that encoding in your message flow.

rekarm01

Posted: Mon Sep 26, 2016 8:23 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

mhd_zabi wrote:

CCSID is 1208 and Encoding is 273 in both input and output properties.

There is also the possibility that the message flow works, but something is wrong with whatever application interprets/renders/displays the output message. For example, if NotePad++ requires a BOM in order to correctly detect UTF-8, but the message flow does not preserve the BOM when transforming the message, then NotePad++ would garble the message. Applications such as rfhutil and amqsbcg0 that can display message bytes in hexadecimal are much more useful for unambiguously examining messages.

mhd_zabi

Posted: Mon Sep 26, 2016 11:27 pm Post subject:

Newbie

Joined: 25 Sep 2016
Posts: 7
Location: Mangalore

Thank you all for your inputs, specially rekarm01

I explored some more based on what rekarm01 said and below is what I found.
The input we recieved from the source was a UTF-8 with BOM, as i could see the BOM hex code in the input. The output file that i generated, did not have this. Now Notepad++ still detects this without the BOM and shows encoding as "UTF-8 Without BOM", but excel and our end application (which in this case is Siebel) is not able to detect the encoding as UTF-8 in which case it takes it as ASCII and corrupts the special characters. The BOM seems to be getting removed when i remove the header record from the input.
I was wondering whether there is some way to retain this BOM even after removing the header record.

timber

Posted: Tue Sep 27, 2016 1:49 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

The BOM is just a sequence of 3 bytes. You could write those 3 bytes to the file before appending the other records.

Alternatively, you could un-select the 'Skip first record' and put in some logic to truncate all except the first character of the first record ( rather than skipping it completely ).

Before you do either of those, you may want to ask yourself whether the BOM is actually required by the receiver.

mhd_zabi

Posted: Thu Sep 29, 2016 1:46 am Post subject:

Newbie

Joined: 25 Sep 2016
Posts: 7
Location: Mangalore

Thanks Timber

I was able to append the BOM to the beginning of the file by converting it to Bitsream and it worked fine. The end application was able to read the file as UTF-8 and processed the characters correctly.

Thank you all for you help.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Special characters getting converted to different characters

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP