Author |
Message
|
rekarm01 |
Posted: Thu Sep 06, 2012 12:52 am Post subject: Re: Encoding French |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Prasanth wrote: |
could you please let me know what would be the ESQL to replace ((æ,é,ù) with space. Basically I do not know the hex rep of the specified french characters like X'0D', what is it for french characters? |
How about:
Code: |
targetString = TRANSLATE(sourceString, 'æéù', ' '); |
?
Characters don't have a hex representation; bytes do. The hex representation depends on whatever ccsid the message flow might use to convert characters to bytes.
fjb_saper wrote: |
Make sure you send the stuff UTF-8 through the send mail plugin. |
The current poster re-opened an old thread; there is no send mail plugin for WMBv7. |
|
Back to top |
|
 |
Prasanth |
Posted: Thu Sep 06, 2012 6:32 am Post subject: |
|
|
Newbie
Joined: 05 Sep 2012 Posts: 7
|
I had something like the below in my compute node
Declare DeleteChar CHAR;
SET DeleteChar=CAST(X'0D0A' AS CHAR CCSID InputRoot.MQMD.CodedCharSetId);
SET TEMP=REPLACE(TEMP, DeleteChar , '');
I have a trace node immediately after the fileinput node and there it shows that the special character I had (é) is shown weird.So I am sure Message Broker itself is changing that odd. So I assume I should have some code in my compute node just after the file input node to convert that characters. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Sep 06, 2012 6:50 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I suggest that you go back to basics. Check that the input CSV file is correct. Check the actual bytes of the file if necessary. Check that it really does contain valid utf-8 characters.
Then check the output of the XSL transform node. Check the actual byte values. Check that everything was parsed correctly. Test with escaped fields and empty fields.
Quote: |
I had something like the below in my compute node
Declare DeleteChar CHAR;
SET DeleteChar=CAST(X'0D0A' AS CHAR CCSID InputRoot.MQMD.CodedCharSetId);
SET TEMP=REPLACE(TEMP, DeleteChar , ''); |
If you have correctly parsed the CSV message then you should not have any cr/lfs in your message tree. Unless they occur in the middle of a field value, in which case you probably should not be deleting them.
Quote: |
I have a trace node immediately after the fileinput node and there it shows that the special character I had (é) is shown weird.So I am sure Message Broker itself is changing that odd. |
You may be right, but I don't follow your logic. How do you know that your stylesheet is not the culprit? |
|
Back to top |
|
 |
Prasanth |
Posted: Thu Sep 06, 2012 7:17 am Post subject: |
|
|
Newbie
Joined: 05 Sep 2012 Posts: 7
|
Hi,
am new to Message Broker.
The context is that I get a CSV and it can have any special characters (including Fre3nch characters) However this CSV file need to be transformed to xml file and also all the special characters need to be transformed to space and French chars like (é,æ,Æ) should be converted to lower e in the output xml.
So I have a message set defining the elements of CSV.Then in the actual flow I have file input node (to accept the CSV file) a compute node (where I transform to xml) a xsl translate node (transform special chars) a french translate node(to transform french chars) and file output node.
So the result was csv gets transformed to xml,and special chars ('.0-) etc get transformed to space.I have used unicode representation to transform in my xsl translate nodes(both special chars and French chars)
but my problem is the french chars appear weird.I need to get rid of this.
So based on some people's advise ,I thought I should have some code to be included in compute node to get rid of french chars.
Since this needs to go to deploment by Monday,people are really waiting on m.
So any help is really apprecated. |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Sep 06, 2012 7:48 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Prasanth wrote: |
but my problem is the french chars appear weird.I need to get rid of this.
|
How do the appear to be wierd? IS that when you are using some tool to view them? Does this tool handle the CCSID that the characters are using?
Considter this, getting rid of the French Character may cause data corruption.
Go back to the beginning and verify that the characters and the CCSID for their representation match. i.e. The HEX values for these characters match what the CCSID Character Map expects them to be. If they are not correct then you have the not uncommon case of GIGO. Garbage In, Garbage Out.
There are many posts on this forum where the answer to the problem of Character conversion was that the Supplied CCSID and the actual data didn't match. IF this is the case your only course of action is to go back to the source of the data and get them for fix their problem. Sadly this is sometimes easier said than done.
Just be careful if the CCSID is one of the ISO8859 types. ISO-8859-1 not not map every character in the same was as ISO-8859-15. The latter one has the Euro Symbol. the -1 variant does not. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
Prasanth |
Posted: Thu Sep 06, 2012 8:33 am Post subject: |
|
|
Newbie
Joined: 05 Sep 2012 Posts: 7
|
I had
"SET OutputRoot.Properties.CodedCharSetId=1208 ;" in my compute node.
I am opening the input CSV and the output xml filein EmEditor .
I have "<?xml version="1.0" encoding="UTF-8"?>" in all my xml's |
|
Back to top |
|
 |
Vitor |
Posted: Thu Sep 06, 2012 8:39 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Prasanth wrote: |
I had
"SET OutputRoot.Properties.CodedCharSetId=1208 ;" in my compute node.
I am opening the input CSV and the output xml filein EmEditor .
I have "<?xml version="1.0" encoding="UTF-8"?>" in all my xml's |
That's all very nice and nothing to do with what my associate meant. He was taking about the input that you're getting not what you're outputting. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Sep 06, 2012 8:40 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Prasanth wrote: |
Since this needs to go to deploment by Monday,people are really waiting on m. |
Following from the earlier posts:
a) If you have an unrealistic deadline you should push back
and more importantly
b) If it's a problem with the input data you can sit there all weekend and potentially not be able to fix it. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Sep 06, 2012 9:44 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Prasanth wrote: |
I have "<?xml version="1.0" encoding="UTF-8"?>" in all my xml's |
That is all well and good. However if the CCSID that the actual data was written in was not equivalent to UTF-8 (or UTF-16 etc) then no matter what this says, the data is Garbage.
I've often had to ask a developer "why did you put the Encoding="UTF=8" at the start of a message when they wrote the data using ISO-8859-1 (0r -15).
The common answer was 'Everybody does it so I did it as well'.
Getting them to change the encoding to represent the CCSID of the actual data worked wonders with the usability of the data.
Understanding this stuff takes even the best of us a while. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
rekarm01 |
Posted: Fri Sep 07, 2012 12:37 am Post subject: Re: Encoding French |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Prasanth wrote: |
Code: |
SET DeleteChar=CAST(X'0D0A' AS CHAR CCSID InputRoot.MQMD.CodedCharSetId);
SET TEMP=REPLACE(TEMP, DeleteChar , ''); |
|
X'0D0A' is the ASCII carriage-return/line-feed. It's not particularly French. Where's the code that handles the French characters?
Prasanth wrote: |
I have a trace node immediately after the fileinput node ... |
If the message flow uses a FileInput node, where does the MQMD in the code above come from?
Before the FileInput node, use a hex editor or some other tool that can display the file contents in hex, to confirm the correct encoding of the input file. Make sure that it matches the Message coded character set ID property in the FileInput node.
The Trace node output itself is subject to character conversion, depending on how the Trace node is configured, environment variables, or other factors. It's entirely possible that the message flow is converting the characters correctly, but the Trace node is corrupting them upon output, or that the display tool is not displaying the Trace node output correctly.
Prasanth wrote: |
So I assume I should have some code in my compute node just after the file input node to convert that characters. |
First make sure that the FileInput node is reading the file contents correctly. If it isn't, then a compute node isn't likely to help. |
|
Back to top |
|
 |
|