Author |
Message
|
Avocado's number |
Posted: Fri Apr 12, 2013 4:00 am Post subject: Issue in parsing characters such as °, ³ (superscript) |
|
|
Newbie
Joined: 12 Apr 2013 Posts: 4
|
I am using a CSV file as an input to a FileInput node in my flow. The records present in the file are inserted into the database (DB2 9.7).
One of the records present in the CSV carries an entry °C. Initially, I used the MRM parser at the FileInput node and the record inserted into the database looked like -C without the °. Later, I switched to the DFDL parser for the same file and used a FileOutput node to verify the output. This time, the output generated was °C, however, an insert into the database changed to C.
Is there a way to represent special characters such as ° or ³, as is, in the database via a message flow?
Thanks for your help! |
|
Back to top |
|
 |
McueMart |
Posted: Fri Apr 12, 2013 4:28 am Post subject: |
|
|
 Chevalier
Joined: 29 Nov 2011 Posts: 490 Location: UK...somewhere
|
Sounds like your database can't deal with the ° character to me! What Character set do you have set for the field which you are storing the data in? |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Apr 12, 2013 4:30 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
you'd have to store the message in UTF-8 to be sure that all special characters are conserved...
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Apr 14, 2013 11:26 am Post subject: Re: Issue in parsing characters such as °, ³ (superscript) |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
Avocado's number wrote: |
Is there a way to represent special characters such as ° or ³, as is, in the database via a message flow? |
That depends on the DB2 code set used when creating the database, and the database column type. Check the DB2 documentation for more information.
fjb_saper wrote: |
you'd have to store the message in UTF-8 to be sure that all special characters are conserved |
It's true that UTF-8 will preserve non-ASCII characters, but DB2 string functions use byte indexing, not character indexing, which can affect position and length calculations in certain circumstances. The InfoCenter has more information, here and here. |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Apr 14, 2013 8:56 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
I was being purposefully being generic there. In the particular case of DB2 you can store the text as BLOB in UTF-8. This way no risk to do a positional byte instead of character as you'd have to import to WMB before going positional on it (DFDL?)
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Avocado's number |
Posted: Sun Apr 14, 2013 10:25 pm Post subject: |
|
|
Newbie
Joined: 12 Apr 2013 Posts: 4
|
The database was using a code page of 1252. This has been changed to 1208 and now the representation of °C has become ┬░C. Also, the data type of the column in DB2 is CHARACTER. Any pointers, please? |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Apr 14, 2013 11:31 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Did you delete the row and re-insert it with the new DB CCSID?
Just changing the DB CCSID won't fix the problem. (AFAIK) _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
Avocado's number |
Posted: Mon Apr 15, 2013 1:34 am Post subject: |
|
|
Newbie
Joined: 12 Apr 2013 Posts: 4
|
I dropped and re-created the database to change the codepage, then created the tables and tried the insert via my flow.
Here is a summary of what I have tried so far-
1. Parsed the input file correctly against the DFDL test client parser.
2. Attached trace node after the FileInput node- Does not show the °C character correctly.
3. Flow wired to a FileOutput node- The output file has the correct character for °C.
4. Flow wired to a Compute node for database insert- Does not show the °C character correctly.
5. Changed the database codepage to 1208- Does not show the °C character correctly.
6. Changed ODBC settings to configure 'ConnectCodepage' for DSN to use 1208- Does not show the °C character correctly.
7. A direct insert of values into DB2 from control centre- Value is represented correctly in database.
8. Hard- coded '°C' against the column value in insert statement in ESQL- Value is represented correctly in database.
9. Passed '°C' in a character variable to the insert statement in ESQL- Value is represented correctly in database. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Apr 15, 2013 2:20 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
1. Parsed the input file correctly against the DFDL test client parser.
2. Attached trace node after the FileInput node- Does not show the °C character correctly. |
There's your problem. It looks as if the DFDL parser has been told to use the wrong text encoding when parsing this field.
That's one theory, anyway. A debug-level user trace should reveal what the DFDL parser is doing.
The most important question at this point is: what is the text encoding of the input file? |
|
Back to top |
|
 |
Avocado's number |
Posted: Mon Apr 15, 2013 3:58 am Post subject: |
|
|
Newbie
Joined: 12 Apr 2013 Posts: 4
|
The text encoding for the input file is UTF-8. However, the text encoding for this particular field was set to <dynamically set>. I changed this to UTF-8 and I am able to receive a correct representation of all special characters now (both in the flow and in the database).
Thanks much for your inputs. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Apr 15, 2013 4:55 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I'm glad you got it working.
'Dynamically set' causes the DFDL property 'encoding' to be set to '$dfdl:encoding'. That is a reference to a predefined variable. In WMB, the predefined variable $dfdl:encoding is set to the 'transport encoding'. In MQ that means the CCSID from the MQMD header. For a FileInput node it is probably the platform default encoding. |
|
Back to top |
|
 |
|