Author |
Message
|
golam1983 |
Posted: Sat Mar 06, 2010 6:29 pm Post subject: Broker Unicode support for Oracle database |
|
|
 Apprentice
Joined: 27 Jun 2008 Posts: 35
|
Currently I have requirement to handle/manipulate unicode data(UTF8) from oracle database. Here is my current environments
Broker: v6.0.0.3 in Linux X86 platform
Oracle: 9i nls_characterset UTF8
However the "Support for Unicode and DBCS data in databases" topic in the broker documentation says that -
Code: |
If you are using Oracle:
v Your database must be created with NLS_CHARACTERSET of AL32UTF8.
v Your ODBC data source definition must include the setting
ColumnSizeAsCharacter=1.
On UNIX and Linux platforms, this setting must be included in the appropriate
stanza in the ODBC ini files.
On Windows platforms, this string value must be added to the ODBC data
source key in the registry.
See Enabling ODBC connections to the databases for further information.
v For 32-bit connections, you must set the variable NLS_LANG in the broker
environment to the value <yourlanguage>.<yourterritory>.AL32UTF8. |
My oracle database have NLS_CHARACTERSET UTF8 and my DBA team is not agreed at all to shift this to AL32UTF8.
My locale(LC_ALL) set to en_us.UTF-8
NLG_LANG=american_america.UTF8/AL32UTF8 (tried both of these)
Set ColumnSizeAsCharacter=1 in proper ODBC datasource stanza.
Datadirect version is 5.0
These settings are not working properly while inserting/selecting data from database.
Would be interested to know if anybody handled this kind of situation or any idea/insight to enable unicode support.
[/code] |
|
Back to top |
|
 |
Vitor |
Posted: Sat Mar 06, 2010 7:45 pm Post subject: Re: Broker Unicode support for Oracle database |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
golam1983 wrote: |
These settings are not working properly while inserting/selecting data from database. |
Given that these are not the settings given in the documentation it's not all that surprising.
golam1983 wrote: |
Would be interested to know if anybody handled this kind of situation or any idea/insight to enable unicode support.
|
Perhaps get the DBA team to shift their position, or at least justify it? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
golam1983 |
Posted: Sat Mar 06, 2010 9:30 pm Post subject: |
|
|
 Apprentice
Joined: 27 Jun 2008 Posts: 35
|
Thanks Vitor.
Just to repeat, the only difference between my database setting and ibm documented database setting is utf8(my current database charset) and al32utf8(ibm must condition charset)
I have checked the hex stream in broker before insert. They are in proper utf8 (ccsid 1208) format. As per oracle documentation if the client's NLS_LANG set to same charset like database (here utf8), there will be no data conversion will be happen. Thats what i want, no conversion. But in my case my utf8 data is getting further converted. For an example "ª" (C2B6 in hex UTF8) got further converted into some (xxxxxxxx). I just do not want this conversion.
So I am checking here if somebody have faced same type of scenario before taking any further step toward this issue(IBM support)
This clearly a incapability of broker to do a simple task simply(just forwarding utf8 data) |
|
Back to top |
|
 |
Vitor |
Posted: Sat Mar 06, 2010 9:43 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
golam1983 wrote: |
This clearly a incapability of broker to do a simple task simply(just forwarding utf8 data) |
There's nothing simple about code pages, as the number of posts on the subject here will attest to.
I suspect something interesting is happening on the interface between Oracle & WMB. This was the basis of my question about why the DBA team won't change the Oracle setting; if both code pages are UTF8 why not alter the database? Clearly it's not that simple or a) your DBAs would alter the database & b) there's would be those instructions in the broker documentation. So that's one point.
I'm also interested by your statement:
golam1983 wrote: |
I have checked the hex stream in broker before insert. They are in proper utf8 (ccsid 1208) format |
and
golam1983 wrote: |
But in my case my utf8 data is getting further converted |
How have you determined that WMB is doing the additional conversion? If the stream in broker is correct immediately prior to insert then why couldn't Oracle be remapping the characters? Likewise how are you determining this additional conversion (the 'xxxxxx' you talk about) is happening? If my assumption is correct and you're doing this with a SQL SELECT statement, could not the conversion be happening at this point?
Just 2 points that might help shine a light on this. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Mar 07, 2010 12:24 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
As well you have to realize that UTF-8 is not the same as Unicode which the broker uses internally. As you did not provide us with any code, how can we know that you are not doing something funky in the code?
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Mar 07, 2010 5:45 pm Post subject: Re: Broker Unicode support for Oracle database |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
golam1983 wrote: |
... the "Support for Unicode and DBCS data in databases" topic in the broker documentation says that -
Quote: |
If you are using Oracle:
- Your database must be created with NLS_CHARACTERSET of AL32UTF8.
- For 32-bit connections, you must set the variable NLS_LANG in the broker environment to the value <yourlanguage>.<yourterritory>.AL32UTF8. |
|
That seems fairly self-explanatory.
golam1983 wrote: |
My oracle database have NLS_CHARACTERSET UTF8 and my DBA team is not agreed at all to shift this to AL32UTF8. |
Oracle also recommends switching from "UTF8" to "AL32UTF8", as of Oracle 9i. Push back on the DBA team to switch to "AL32UTF8".
golam1983 wrote: |
NLG_LANG=american_america.UTF8/AL32UTF8 (tried both of these) |
"NLG_LANG" should be "NLS_LANG". "american_america" should be "AMERICAN_AMERICA".
golam1983 wrote: |
I have checked the hex stream in broker before insert. ... As per oracle documentation if the client's NLS_LANG set to same charset like database (here utf8), there will be no data conversion will be happen. |
There are at least three possible opportunities for conversion:- from bitstream to ESQL CHARACTER (UCS-2)
- from ESQL CHARACTER (UCS-2) to ODBC client (NLS_LANG)
- from ODBC client (NLS_LANG) to Oracle server (NLS_CHARACTERSET)
The Oracle documentation only addresses the third conversion, (ODBC client->Oracle server).
It's the second one (ESQL->ODBC client) that requires a supported NLS_LANG characterset, (such as "ALT32UTF8", but not "UTF8" - nor "utf8").
golam1983 wrote: |
For an example "ª" (C2B6 in hex UTF8) got further converted into some (xxxxxxxx). |
"Some (xxxxxxxx)"? Which (xxxxxxxx)?
golam1983 wrote: |
Would be interested to know if anybody handled this kind of situation or any idea/insight to enable unicode support. |
Push back on the DBA team to switch to "AL32UTF8".
Or, change the relevant database columns from character types to BLOB, and insert bytes instead of characters.
Or, add a JavaCompute node, and insert using JDBC, instead of ODBC.
Or, ... push back on the DBA team to switch to "AL32UTF8". |
|
Back to top |
|
 |
golam1983 |
Posted: Mon Mar 08, 2010 1:07 pm Post subject: |
|
|
 Apprentice
Joined: 27 Jun 2008 Posts: 35
|
Thanks rekarm01...I will check with my DBA team, if anyway the database could be switched to AL32UTF8.
I have set the following -
LANG=en_us.UTF-8
NLS_LANG=AMERICAN_AMERICA.UTF8
This seems like working when I am Inserting data.
But its failing when I am trying to select 4 byte charecter data from oracle.
Here is the data stream I am using for testing - ed9fbfee8080efbfbdf48fbfbff48fbfbe
The last 8 byte represents 2 charecter each 4 bytes.
Insert/Update is working fine.
But when I am selecting the same data, the return is -
ed9fbfee8080efbfbdefbfbdefbfbdefbfbdefbfbdefbfbdefbfbdefbfbdefbfbd
Both 4 bytes char seems to be screwed while selecting from oracle. Any possible solution for this(Except switching the database to AL32UTF8) |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue Mar 09, 2010 8:52 pm Post subject: Re: Broker Unicode support for Oracle database |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
golam1983 wrote: |
I have set the following -
LANG=en_us.UTF-8
NLS_LANG=AMERICAN_AMERICA.UTF8
But its failing when I am trying to select 4 byte character data from Oracle. |
supplementary characters require more than 3 bytes for UTF-8.
For "AL32UTF8", Oracle stores and retrieves a supplementary character as a 4-byte UTF-8 character.
For "UTF8", Oracle stores and retrieves a supplementary character as two 3-byte CESU-8 characters.
That's the critical difference between them.
Check the Oracle9i Database Globalization Support Guide or the Oracle Unicode database support white paper for more details.
golam1983 wrote: |
Here is the data stream I am using for testing - ed9fbf ee8080 efbfbd f48fbfbf f48fbfbe
The last 8 byte represents 2 charecter each 4 bytes.
Insert/Update is working fine. |
Really? That's an unusual UTF-8 data stream:ed9fbf = U+D7FF = contact admin code point (reserved for future use)
ee8080 = U+EE00 = private-use character
efbfbd = U+FFFD = <REPLACEMENT_CHARACTER> = '�'
f48fbfbf = U+10FFFF = private-use character
f48fbfbe = U+10FFFE = private-use character Is that what the data stream looks like before or after inserting into the database?
golam1983 wrote: |
But when I am selecting the same data, the return is -
ed9fbf ee8080 efbfbd efbfbd efbfbd efbfbd efbfbd efbfbd efbfbd efbfbd efbfbd
Both 4 bytes char seems to be screwed while selecting from oracle. |
efbfbd = U+FFFD = <REPLACEMENT_CHARACTER> = '�'; these indicate byte sequences that don't map to Unicode characters, (probably due to the incompatibility between UTF-8 and CESU-8 formats).
Is the standard Oracle client any better at selecting the data with the same configuration?
golam1983 wrote: |
Any possible solution for this (Except switching the database to AL32UTF8) |
In the interim, Oracle's suggested work-around is to set the client-side NLS_LANG character set to "AL32UTF8", and let Oracle convert data between "AL32UTF8" client and "UTF8" database. |
|
Back to top |
|
 |
golam1983 |
Posted: Sun Mar 14, 2010 10:01 am Post subject: |
|
|
 Apprentice
Joined: 27 Jun 2008 Posts: 35
|
Thanks rekarm01, for your valuable comments on this.
Yes the Insert/Update look perfect. I have used rawhex function for oracle to verify it.
You are correct, I will use AL32UTF8 and Let oracle to convert it in UTF8.
Again, thank you very much  |
|
Back to top |
|
 |
shatvani |
Posted: Tue Sep 07, 2010 6:29 am Post subject: |
|
|
Newbie
Joined: 08 Mar 2010 Posts: 6
|
Could you help me in the same issue.
My NLS_CHARACTERSET=UTF8,
NLS_NCHAR_CHARACTERSET=AL16UTF16.
I call a stored procedure which returns a CLOB,
I'd like to put it into an XMLNSC domain but insted of my accentuated character I get mismatch character:
Quote: |
<KEP_NEV>"A" kategóriás hajóvezető</KEP_NEV> |
=>
Quote: |
<KEP_NEV>&quot;A&quot; kategĂÂłriĂ¡s hajĂÂłvezetĂ…â€</KEP_NEV> |
I tried smdavies99 suggestion:
Code: |
CREATE PROCEDURE DoStuff() BEGIN
SET OutputRoot.MQMD = NULL;
SET OutputRoot.MQRFH2 = NULL;
CREATE NEXTSIBLING of OutputRoot.Properties DOMAIN 'MQMD';
set OutputRoot.MQMD.Version = MQMD_CURRENT_VERSION;
SET OutputRoot.MQMD.CodedCharSetId = 1208;
SET OutputRoot.MQMD.Format='MQSTR'; |
but unfortunatelly it was useless.
Thank you in anticipation. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Sep 07, 2010 6:44 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
shatvani wrote: |
Could you help me in the same issue. |
How do you know it's the same issue? The same problem, on the same software level, and the same platform? This is why we prefer people to open new threads and refer to possibly connected earlier ones.
shatvani wrote: |
My NLS_CHARACTERSET=UTF8,
NLS_NCHAR_CHARACTERSET=AL16UTF16. |
Much of this thread discusses setting these to values you're not using. Why have you chosen these values in the face of direct instructions from the broker documentation and the advice on this thread, then been surprised when it doesn't work properly?
shatvani wrote: |
I tried smdavies99 suggestion: |
What suggestion? Where? I don't see it in this thread.
shatvani wrote: |
but unfortunatelly it was useless. |
Of course it was. If you look at what the code is doing, then trying reading both this thread and the broker documentation you'll see why it won't help with this problem. Which is why I suspect it wasn't suggested on this thread. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
shatvani |
Posted: Tue Sep 07, 2010 10:31 am Post subject: |
|
|
Newbie
Joined: 08 Mar 2010 Posts: 6
|
Dear Vitor,
NLS_NCHAR_CHARACTERSET is used by NCLOB, NCHAR, etc. The procedure what I use returns with CLOB, so I don't understand why it would be effective to convert AL16UTF16 to UTF8?
I have read through a lot of posts, for example smdavies99 's, and I think so the problem is around the character encoding, but I don't know where.
Thank you anyway |
|
Back to top |
|
 |
Vitor |
Posted: Tue Sep 07, 2010 10:51 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
shatvani wrote: |
I don't understand why it would be effective to convert AL16UTF16 to UTF8? |
I don't think I or anyone in this thread has suggested that. Rather the reverse in fact.
shatvani wrote: |
I have read through a lot of posts, for example smdavies99 's |
I think there's more than one by that poster, I hold by my assertion that there's no such post on this thread for the good & sufficient reason that it won't help with this problem.
shatvani wrote: |
I think so the problem is around the character encoding |
You think?
shatvani wrote: |
I don't know where. |
Try re-reading this post & using some of the suggestions. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
shatvani |
Posted: Tue Sep 07, 2010 10:57 am Post subject: |
|
|
Newbie
Joined: 08 Mar 2010 Posts: 6
|
|
Back to top |
|
 |
smdavies99 |
Posted: Tue Sep 07, 2010 9:23 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Regular posters here (Like Vitor, myself and the IBM'ers) have commented on literally thousands of topics.
If you reference a post by someone it really would help to put a link to in when you reference it.
This way we can see if the thing you have tried is actually relevant to your problem or has been taken by mistake out of context.
As the person's post that is being referenced even I can't remember the exact relationship of my suggestion to this one.
Add to this, Vitors suggestion to start a new thread and reference the old one rather than re-opening it, I hope this might help you in the future. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|