Author |
Message
|
BCBS |
Posted: Sun Jul 18, 2010 12:58 am Post subject: [RESOLVED] Unable to store/retrieve Chinese char's in db |
|
|
 Apprentice
Joined: 12 Jul 2006 Posts: 37
|
I am trying to work with a message that contains Chinese characters. All I want to do is insert an xml message (whole) in a database as CLOB.
Process for insert into database, retrieval from database is happening without exceptions. But after I retrieve the message and examine, the Chinese characters seem to be damaged.
Environment:
Message Broker 6.0.10
Oracle 10G (support for UTF-8 characters enabled)
Insert:
DECLARE msgBitStream BLOB InputRoot.BLOB.BLOB;
INSERT INTO Database.AGGR_DATA_T (
....
AGGR_MSG_BODY_IMG,
..........
) VALUES (
........
msgBitStream,
.........
); _________________ _________________________________
Last edited by BCBS on Thu Jul 22, 2010 1:16 pm; edited 1 time in total |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Jul 18, 2010 5:18 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Well,
Before you insert your BLOB into the CLOB field of the DB which CCSID ist it in? Hint: if the CCSID of the BLOB is not 1208, you may want to make it so before inserting it into the DB.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Jul 18, 2010 5:27 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
You need to trace/dump your message (in hex format) from inside broker.
This will tell you if the data you are receiving is correctly formatted.
As other posts on this topic in this forum will show, you need to make sure that the data your receive is really correctly formatted before it gets into broker.
For example,
A Webapp (portal) portled gets a character field from the user. The default CCSID for that is ISO 8859-1 ( aka 817). This is wrapped in a SOAP UTF-8 envelope but the data in the character is not converted from 817 to 1208 before the SOAP message is sent.
Most of the time, this is not a problem but there are cases (eg GBP sign & Euro) that make the receiving flow throw its toys out of the pram.
So, please check that the Chinese Characters are correctly escaped in the incoming UTF-8 message ( the Endian also matters!). It does mean decoding the Hex though but that is the only way to be sure. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Jul 18, 2010 5:55 am Post subject: Re: Unable to store/retrieve Chinese char's in database |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
BCBS wrote: |
But after I retrieve the message and examine, the Chinese characters seem to be damaged. |
How are they damaged? Be more specific.
BCBS wrote: |
Environment:
Message Broker 6.0.10
Oracle 10G (support for UTF-8 characters enabled) |
How was support for UTF-8 enabled in Oracle 10G? Be more specific.
Maybe this topic will help. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Jul 18, 2010 6:11 am Post subject: Re: Unable to store/retrieve Chinese char's in database |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
smdavies99 wrote: |
For example,
A Webapp (portal) portled gets a character field from the user. The default CCSID for that is ISO 8859-1 (aka 817). |
aka 819.
smdavies99 wrote: |
So, please check that the Chinese Characters are correctly escaped in the incoming UTF-8 message (the Endian also matters!). |
UTF-8 doesn't have endianness. |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Jul 18, 2010 6:19 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
indeed utf-8 does not have endianess. But once you escape into double byte (or 4 byte) you do. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
BCBS |
Posted: Sun Jul 18, 2010 11:06 am Post subject: |
|
|
 Apprentice
Joined: 12 Jul 2006 Posts: 37
|
Thanks all for your responses.
fjb_saper wrote: |
Well,
Before you insert your BLOB into the CLOB field of the DB which CCSID ist it in? Hint: if the CCSID of the BLOB is not 1208, you may want to make it so before inserting it into the DB.  |
Yes the CCSID is 1208, and ENCODING is 273. To make sure these values are correct, I just ran the message thru simple flow including a trace node, and it clearly showed these values in InputRoot.Properties.
Also, I am wondering, here in the forum, I am trying to post the big response (wanted to post ESQL for retrieval), and every time I am getting webpage error (service may be temporarily down...). Any idea? Its been long time I visited this site, wondering if there is any limit on the size of the posting. _________________ _________________________________ |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Jul 18, 2010 12:01 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Have you checked that the HEX coding of the message (BLOB.BLOB) is correct?
I've experience the same issue as you when I've tried to post code/trace output etc.
When I hit the submit button the error occurs.
Perhaps this is an issue for the site admins?
I normally post from Firefox.(in case it is browser specific) _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Jul 18, 2010 3:57 pm Post subject: Re: Unable to store/retrieve Chinese char's in database |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
BCBS wrote: |
Yes the CCSID is 1208, and ENCODING is 273. |
ENCODING is for binary numeric data. It's not relevant for character data (such as XML).
BCBS wrote: |
To make sure these values are correct, I just ran the message thru simple flow including a trace node, and it clearly showed these values in InputRoot.Properties. |
To make sure these values are correct, it's necessary to check them against the actual bytes in InputRoot.BLOB.BLOB, (as smdavies99 suggests).
BCBS wrote: |
I am trying to post the big response (wanted to post ESQL for retrieval), and every time I am getting webpage error (service may be temporarily down...). Any idea? Its been long time I visited this site, wondering if there is any limit on the size of the posting. |
Whether there is a site-imposed limit or not, try to pare down the posts to just the relevant bits, as a courtesy to the other participants whose help you're asking for.
Enclose any ESQL, usertrace data, or other formatted data in [code] tags to make it easier to read.
Try not to post the entire contents of Root.BLOB.BLOB; the portions that represent the XML declaration and any non-ASCII data are usually sufficient.
Add line breaks to the posted output as needed, to minimize horizontal scrolling. Use the 'Preview' button before posting.
rekarm01 wrote: |
How was support for UTF-8 enabled in Oracle 10G? Be more specific.
Maybe this topic will help. |
Did that topic help? |
|
Back to top |
|
 |
kimbert |
Posted: Mon Jul 19, 2010 12:12 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
ENCODING is for binary numeric data. It's not relevant for character data (such as XML). |
I hesitate to disagree with someone who clearly knows more about this subject than I do, but...
http://en.wikipedia.org/wiki/UTF-16/UCS-2#Byte_order_encoding_schemes
I'm fairly sure that endianness does need to be factored into the equation when encoding and decoding streams of UTF-16 and UTF-32 data. Sometimes the encoding implies the endianness ( UTF-16BE, UTF-16LE) and sometimes it gets supplied by a BOM. But in the absence of either of those, the endianness must be supplied in some other way. |
|
Back to top |
|
 |
rekarm01 |
Posted: Mon Jul 19, 2010 3:37 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
kimbert wrote: |
Quote: |
ENCODING is for binary numeric data. It's not relevant for character data (such as XML). |
I hesitate to disagree with someone who clearly knows more about this subject than I do, but... |
To be more precise, for the WMB Root.Properties header (and other related headers):- the Encoding field describes the byte encoding for numeric data in a message (binary integers, packed-decimal integers, and floating-point numbers)
- the CodedCharSetId field describes the byte encoding for character data in a message (similar to the XML encoding declaration, or MIME/SGML charset parameter, or the Unicode character map)
Related headers should have a Format field, to identify which bytes represent numeric data, and which message bytes represent character data. Messages with Format=MQFMT_STRING contain only character data, so the Encoding field is not relevant. This should include any incoming (or outgoing) XML messages; XML parsers encode any numeric or other binary data as characters.
kimbert wrote: |
http://en.wikipedia.org/wiki/UTF-16/UCS-2#Byte_order_encoding_schemes
I'm fairly sure that endianness does need to be factored into the equation when encoding and decoding streams of UTF-16 and UTF-32 data. Sometimes the encoding implies the endianness ( UTF-16BE, UTF-16LE) and sometimes it gets supplied by a BOM. But in the absence of either of those, the endianness must be supplied in some other way. |
The CCSID describes the character encoding scheme (including endianness):- ccsid=1208: UTF-8
- ccsid=1200: UTF-16BE
- ccsid=1202: UTF-16LE
- ccsid=1204: UTF-16 with BOM
- ccsid=1232: UTF-32BE
- ccsid=1234: UTF-32LE
- ccsid=1236: UTF-32 with BOM
|
|
Back to top |
|
 |
BCBS |
Posted: Mon Jul 19, 2010 9:42 am Post subject: |
|
|
 Apprentice
Joined: 12 Jul 2006 Posts: 37
|
Hello ALL,
I have been little late in responding or trying out all the valuable suggestions provided byyou guys. Its Monday morning and I am fresh to work on this.
I will update you all with my findings. Thanks again for great help on this so far. _________________ _________________________________ |
|
Back to top |
|
 |
BCBS |
Posted: Thu Jul 22, 2010 1:30 pm Post subject: |
|
|
 Apprentice
Joined: 12 Jul 2006 Posts: 37
|
The issue resolved after we changed the settings on MB server.
rekarm01 wrote: |
Did that topic help? |
Yes. Its the root cause. Thanks much 'rekarm01'
Solution:
1) MB’s ODBC data source definition updated - ColumnSizeAsCharacter=1.
2) Set the variable NLS_LANG in the broker environment to NLS_LANG=AMERICAN_AMERICA.AL32UTF8
3) No changes on DB except characterset for the database is set to UTF8.
smdavies99 wrote: |
Have you checked that the HEX coding of the message (BLOB.BLOB) is correct? |
Before we made the changes, the hex coding was not matching for store with retrieval ' '
After the change, it matched ' '
Once again, thank you all for your valuable input and suggestions. _________________ _________________________________ |
|
Back to top |
|
 |
|