ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Question On Encoding

Post new topic  Reply to topic Goto page 1, 2  Next
 Question On Encoding « View previous topic :: View next topic » 
Author Message
nmaddisetti
PostPosted: Mon Sep 21, 2009 8:00 am    Post subject: Question On Encoding Reply with quote

Centurion

Joined: 06 Oct 2004
Posts: 145

Hi All,

As I understood from the posts in the forum I am setting the Encoing and CCSID as the source application values in the message flow then I am able to parse the xml successfully but I am facing an issue when inserting into DB.

Here is the tag in the source XML:
<Parameter Name="iDtPos">2710A?¹669A</Parameter>

I am using following Encoding and CCSID.

Encoding:'UTF-8'
CCSID:819

With these values I am able to parse and If I am writing onto file system by creating another XML from the source XML then also it is writng correctly like this:
<Data>2710A?¹669A</Data>

But When I inserted into DB it is inserting like this:

2710A?¿669A

And When I extract this value from DB and create XML it is like this:

<Data>2710A??669A</Data>

I am using PASSTHRU statement for this test and in the debug the query string looks correct and it is:

INSERT INTO hca_batch_process_dtl
(hca_batch_process_dtl_id, program_nm, comments )
VALUES
(batch_process_dtl_id_seq.NEXTVAL,'XDataError123','2710A?¹669A')

And When I insert from SQL Plus it is inserting properly:

SQL> Insert into HCA_BATCH_PROCESS_DTL (HCA_BATCH_PROCESS_DTL_ID,PROGRAM_NM,COMMENTS,CREATED_DT) val
ues ('9999999','Test','2710A?¹669A',sysdate);

1 row created.

SQL> select comments from hca_batch_process_dtl where HCA_BATCH_PROCESS_DTL_id='9999999';

COMMENTS
----------------------------------------------------------------------------------------------------
2710A?¹669A

So it looks like something going wrong while executing PASSTHRU statement.

All these tests were done on my windows box:
Windows XP,Message Broker 6.1.0.3,Oracle 10G

Am I missing some thing ?
Can some one suggest me what is wrong with this.

Thanks in Advance,
Venkat.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Mon Sep 21, 2009 9:12 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

You probably need to ensure the database is set up to be running with a national language environment in it's shell at startup.
Back to top
View user's profile Send private message
nmaddisetti
PostPosted: Mon Sep 21, 2009 9:40 am    Post subject: Reply with quote

Centurion

Joined: 06 Oct 2004
Posts: 145

Hi mqjeff,

I am able to insert correctly from SQL PLUS that means from DB side there are no issues If I undertood correctly.

Thanks,
Venkat.
Back to top
View user's profile Send private message
paranoid221
PostPosted: Tue Sep 22, 2009 1:24 am    Post subject: Reply with quote

Centurion

Joined: 03 Apr 2006
Posts: 101
Location: USA

I suspect ur DB is not setup to understand/support UTF-8 !!!


I could be wrong though.
_________________
LIFE is a series of complex calculations, somewhere multiplied by ZERO.
Back to top
View user's profile Send private message
Luke
PostPosted: Tue Sep 22, 2009 4:31 am    Post subject: Re: Question On Encoing Reply with quote

Centurion

Joined: 10 Nov 2008
Posts: 128
Location: UK

nmaddisetti wrote:

I am using following Encoding and CCSID.

Encoding:'UTF-8'
CCSID:819

Hi venkat, when you say you are setting these values, where are you setting them? It looks a bit strange to me, unless the Encoding you're referring to is in the xml declaration? If that's the case, then it's still seems strange, as CCSID 819 is ASCII I think, UTF-8 would be CCSID 1208.

How do you receive the data? Also, what's the NLS_CHARACTERSET of your database?
Back to top
View user's profile Send private message
rekarm01
PostPosted: Tue Sep 22, 2009 9:52 pm    Post subject: Re: Question On Encoding Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

For the WMB Root.Properties header (and other related headers):
  • Encoding is an integer field that specifies the numeric encoding of numeric data in the message (binary integers, packed-decimal integers, and floating-point numbers)
  • CodedCharSetId is an integer field that specifies the character encoding of character data in the message (similar to the XML encoding declaration, or MIME/SGML charset parameter)
Related headers usually have a Format field, to identify which message bytes are numeric data, and which message bytes are character data.

XML is usually all character data, and no numeric data; the message flow does not use the Encoding field, and often won't complain when it's populated with garbage.

The character encoding of an XML message may change throughout the message flow:
  • incoming message: the sending application encodes the character data, and describes the encoding in its message header(s)
  • input bitstream: the input node optionally converts the character data, and describes the resulting encoding in InputRoot.Properties.CodedCharSetId
  • parsed message elements: the standard parsers convert character data from InputRoot.Properties.CodedCharSetId to UCS-2
  • output bitstream: the standard parsers convert parsed character data from UCS-2 to OutputRoot.Properties.CodedCharSetId
  • outgoing message: the output message describes the resulting encoding in its message headers

nmaddisetti wrote:
As I understood from the posts in the forum I am setting the Encoding and CCSID as the source application values in the message flow

... which is only necessary when the output encoding is different from the input encoding.

nmaddisetti wrote:
Here is the tag in the source XML:
<Parameter Name="iDtPos">2710A?¹669A</Parameter>

I am using following Encoding and CCSID.

Encoding:'UTF-8'
CCSID:819

The Encoding is garbage here, but (fortunately?) it's not used anyway. Setting CCSID=819 will convert the character data in the output message to ISO-8859-1.

nmaddisetti wrote:
But When I inserted into DB it is inserting like this:

2710A?¿669A

And When I extract this value from DB and create XML it is like this:

<Data>2710A??669A</Data>

I am using PASSTHRU statement for this test and in the debug the query string looks correct and it is:

INSERT INTO hca_batch_process_dtl
(hca_batch_process_dtl_id, program_nm, comments )
VALUES
(batch_process_dtl_id_seq.NEXTVAL,'XDataError123','2710A?¹669A')

Within a message flow, parsed character data and string literals are always UCS-2, regardless of CCSID (or Encoding).

Proper conversion of character data between the broker and database depends on the broker code page, ODBC data source definition, the database character set, and the broker environment. Check the Info Center for details regarding SQL statements and support for Unicode in databases.
Back to top
View user's profile Send private message
nmaddisetti
PostPosted: Tue Oct 06, 2009 11:35 am    Post subject: Reply with quote

Centurion

Joined: 06 Oct 2004
Posts: 145

Hi,

We are busy with few other tasks and now we were cought up with 2 more files with this kind of new characters so the issue is hot now.
here is the new stuff from two files:

1)
<Parameter Name="iDtPos">PO BOX 1037 P4CASH †¡‰Å†©‰„†©‰[A</Parameter>

2)
<Parameter Name="iDtPos">ØFOWLER</Parameter>

with these values I am not able to parse the xml file.
It is giving following exception:

An XML parsing error has occurred while parsing the XML document1504. 2. 1. 1. An invalid XML character (Unicode: 0xef) was found in the prolog of the document.. /Root/XMLNSC. Caught exception and rethrowing. XML Parsing Errors have occurred. An XML parsing error has occurred while parsing the XML document.




Quote:
The Encoding is garbage here, but (fortunately?) it's not used anyway. Setting CCSID=819 will convert the character data in the output message to ISO-8859-1.


I agree and I know that Encodig value should be integer but I just tried with 'UTF-8' string because the black box which is producing these xml messages using this string 'UTF-8' and with out this setting blackbox producing replacement character boxes. and by setting this string in MQMD Encoding in esql code i see encoding value as 546 both in MQMD and Properties Encoding.

Can some one suggest me what to do to parse this kind of characters.

Thanks & Regards,
Venkat.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Tue Oct 06, 2009 7:52 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

@ Venkat

First rule: You need to seriously consider the CCSID of origin of your document.
Apparently for you it is 1208 (UTF-8 ).
Second rule: you should NEVER downgrade a CCSID. All legal characters in UTF-8 do not exist in CCSID 819 and as such will get replaced with a "dummy" character that might not be a legal character for XML parsing.

Third rule: Consider the destination and destination CCSID.
Sometimes it pays to convert. Sometimes it pays to downgrade if the target CCSID will understand 98% or more of the characters in the CCSID you use.... (exemple UTF-8 to CCSID 500 as in Ebcdic international)...
Most of the time you'll be good with either of CCSID 1200 or 1208, 1252 etc... (Unicode type ccsids )

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Wed Oct 07, 2009 1:45 am    Post subject: Re: Question On Encoding Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

nmaddisetti wrote:
here is the new stuff from two files:

1) <Parameter Name="iDtPos">PO BOX 1037 P4CASH †¡‰Å†©‰„†©‰[A</Parameter>

2) <Parameter Name="iDtPos">ØFOWLER</Parameter>

That looks a bit like gibberish; is that what it's supposed to look like?

nmaddisetti wrote:
It is giving following exception:... An invalid XML character (Unicode: 0xef) was found in the prolog of the document ...

This is not the same error as before. The most likely guess is that the incoming message contains a UTF-8 byte order mark (Unicode: 0xefbbbf) before the XML data. It shouldn't.

nmaddisetti wrote:
rekarm01 wrote:
The Encoding is garbage here, but (fortunately?) it's not used anyway. Setting CCSID=819 will convert the character data in the output message to ISO-8859-1.
nmaddisetti wrote:
I agree and I know that Encoding value should be integer but I just tried with 'UTF-8' string because the black box which is producing these xml messages using this string 'UTF-8' and with out this setting blackbox producing replacement character boxes. and by setting this string in MQMD Encoding in esql code i see encoding value as 546 both in MQMD and Properties Encoding.

What? You lost me after "because". Don't set the MQMD.Encoding to 'UTF-8'. Ever. That's not what it's for.

nmaddisetti wrote:
Can some one suggest me what to do to parse this kind of characters.

Fix the message before it reaches the message flow:
  • Verify that the incoming message data is valid XML; it should not have a byte order mark (unless the ccsid calls for one).
  • Verify that the MQMD.CodedCharSetId and MQMD.Format accurately represent the message data.
It's much more difficult to reliably fix a bad message inside the message flow.

Once that's done, if the actual message flow behavior doesn't match the expected message flow behavior, add Trace nodes as needed, and post the results. If an Exception occurs, run a usertrace as needed, and post the relevant error messages.
Back to top
View user's profile Send private message
nmaddisetti
PostPosted: Wed Oct 07, 2009 1:26 pm    Post subject: Reply with quote

Centurion

Joined: 06 Oct 2004
Posts: 145

Hi,

Now I am able to parse and store into DB on test box which is SunSolaries with WMB 6.1.0.3
Previously I am tesing on my windows box where it is failing as explained in previous post.
Quote:
First rule: You need to seriously consider the CCSID of origin of your document.
Apparently for you it is 1208 (UTF-8 ).

Yes it is 1208
Quote:
Second rule: you should NEVER downgrade a CCSID. All legal characters in UTF-8 do not exist in CCSID 819 and as such will get replaced with a "dummy" character that might not be a legal character for XML parsing.

But it is working with 819 only and there is no luck with any of 1200,1208,1252
It is working for these 3 files with 819 tomorrow there might be another character which is outside of CCSID 819 scope.
So Am I doing something wrong to make it work with source CCSID.

Quote:
What? You lost me after "because". Don't set the MQMD.Encoding to 'UTF-8'. Ever. That's not what it's for.

Now I am not setting any Encoing I am setting only CCSID
Quote:
Verify that the incoming message data is valid XML; it should not have a byte order mark (unless the ccsid calls for one).
Verify that the MQMD.CodedCharSetId and MQMD.Format accurately represent the message data.

When I am opening these xml messages using IE it is saying
An invalid character was found in text content. Error processing resource 'file:///C:/app/inbound/abc3.xml'. Line 1, Posit...

CONNECTICUT GENERAL LIFE INSURANCE COMPANY</E-93></S-N1><S-N3><E-166>P. O. BOX 182223</E-166>&...

Here I confused if the chracters in xml are not valid how broker is parsing .

Can you please suggest me what to do for this issue.

or Can I proceeed with 819.

Thanks & Regards,
Venkat.
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Oct 07, 2009 1:59 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

nmaddisetti wrote:
When I am opening these xml messages using IE it is saying
An invalid character was found in text content. Error processing resource 'file:///C:/app/inbound/abc3.xml'. Line 1, Posit...

CONNECTICUT GENERAL LIFE INSURANCE COMPANY</E-93></S-N1><S-N3><E-166>P. O. BOX 182223</E-166>&...

Here I confused if the chracters in xml are not valid how broker is parsing .


Why? You've already said it works on SunSolaris but not on Windows.

nmaddisetti wrote:
Can you please suggest me what to do for this issue.


You've already had a number of suggestions.

nmaddisetti wrote:
Can I proceeed with 819.


That wasn't one of them.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Oct 07, 2009 7:20 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

fjb_saper wrote:
@ Venkat

First rule: You need to seriously consider the CCSID of origin of your document.
Apparently for you it is 1208 (UTF-8 ).


Note that you did not confirm this.
You need to use a tool like rfhutil and check the content on a character basis AND on a hex basis.

This will show you the ccsid (see mqmd or rfh) header on the message and the message content. If the content does not match the CCSID there is no way you can correctly parse the message.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Thu Oct 08, 2009 1:47 am    Post subject: Re: Question On Encoding Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Unfortunately, the problem description is not very clear, and it keeps changing. It will take much longer to fix the problem that way.

nmaddisetti wrote:
Now I am able to parse and store into DB on test box which is SunSolaris with WMB 6.1.0.3

Did you move some components from the Windows box to the test box? Which ones? Sending application? Message flow? Database? Is the original issue with the DB insert corrupting characters still a problem?

nmaddisetti wrote:
Previously I am testing on my windows box where it is failing as explained in previous post.

And why was that? The previous error suggested a bad input message. Did you follow up on that? What happened?

nmaddisetti wrote:
fjp_saber wrote:
First rule: You need to seriously consider the CCSID of origin of your document.
Apparently for you it is 1208 (UTF-8 ).

Yes it is 1208

Is it? Did you confirm that? How?

fjb_saper wrote:
Note that you did not confirm this.
You need to use a tool like rfhutil and check the content on a character basis AND on a hex basis.

Unfortunately, no amount of code-tweaking on the receiving side is going to fix a bad input message.

nmaddisetti wrote:
But it is working with 819 only and there is no luck with any of 1200,1208,1252

How, and where, exactly, are you converting the message data from one CCSID to another? And how, exactly, is it failing?

nmaddisetti wrote:
When I am opening these xml messages using IE it is saying
An invalid character was found in text content. Error processing resource 'file:///C:/app/inbound/abc3.xml'. Line 1, Posit...

CONNECTICUT GENERAL LIFE INSURANCE COMPANY</E-93></S-N1><S-N3><E-166>P. O. BOX 182223</E-166>&...

... which would tend to confirm a bad input message. Since the error message is truncated, it's not possible to say more than that.

nmaddisetti wrote:
Can you please suggest me what to do for this issue.

Not without more information.
Back to top
View user's profile Send private message
nmaddisetti
PostPosted: Thu Oct 08, 2009 6:44 am    Post subject: Reply with quote

Centurion

Joined: 06 Oct 2004
Posts: 145

Quote:
First rule: You need to seriously consider the CCSID of origin of your document.
Apparently for you it is 1208 (UTF-8 ).


Yes, CCSID used by document originator (blackbox i.e EDIFECS server) is 1208. previously I confirmed this based on the discussion with EDIFECS team and now I opened the document with RFH Util and on RFH tab at left most bottom cornor CCSID is showing as 1208 but not highlited.
Quote:
Did you move some components from the Windows box to the test box? Which ones? Sending application? Message flow? Database? Is the original issue with the DB insert corrupting characters still a problem?

Actual failure happend in production which is SunSolaries.
I took the xml file started testing on my windows box with different CCSIDs
Then I thought of giving a try on test box which is SunSolaries then started working with CCSID 819.
Database is on SunSolaries,Sending application on SunSolaries and it writes the xml file on file system so I kept this sending application outside of my testing with test flow. I took the file xml file from the Sending application box and now moved to WMB test box which is SunSolaries and deployed test message flow to this box.
original database issue is with first file on my windows box and now there is no issue with database when I started testing on SunSolaries box.
When I started testing second and third files I have problem with parsing itself on my windows box but I dont see any issue on SunSolaries box with CCSID 819.
Quote:

How, and where, exactly, are you converting the message data from one CCSID to another? And how, exactly, is it failing?


Let me explain you my test flow

MQInput-->Compute1-->JavaCompute-->RCD-->Compute2-->FileOutput

here I am triggering the flow by putting dummy message on the queue
In Compute1 I am copying message headers and changing the CCSID to 819
In javaCompute I am reading the file from the file system as bitstream and passing to RCD as blob.
In RCD I am using the message set info to parse the file and parsing option I am using is Complete.
In Compute2 I am accessing the tag that contains these special/bad characters and inserting into DB
then again I am selecting the inserted data creating a small xml and writing onto file system.

Now It is working with CCSID 819.
My strugle and problem now is how can I make sure I am good with 819.
Because with the first file (<Data>2710A?¹669A</Data>) I dont see parsing issue and only database issue on windows box.
With second (<Parameter Name="iDtPos">PO BOX 1037 P4CASH †¡‰Å†©‰„†©‰[A</Parameter>) and third <Parameter Name="iDtPos">ØFOWLER</Parameter> files
I see parsig issue on windows box
And all three files are working fine on SunSolaries box.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Oct 08, 2009 8:29 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

  1. You are starting off from files. You do not describe how you received the files or for that matter if the encoding / CCSID has been changed due to the fact that you read/received a message and put it to file or received a file.

    When in doubt use a qcf that has a ccsid of 1208 set on the qcf. This way you request any text message that would be translated to the platform's ccsid to be delivered in UTF-8...

    Also think about what you do when using a java reader/writer. Better specify the CCSID because java uses unicode and not UTF-8 as internal representation. Now would you not expect a writer without CharSet specification to write in the platform's charset (i.e. 819)?

  2. you parse using CCSID 819... This may well be because of the way you pick up / receive your data. The platform's native CCSID is 819 (Solaris).

    Does this mean that in windows you should be using CCSID 437? Remember what I said earlier about downgrading the CCSID!

  3. if you want to make sure that everything works as designed you need to take care of following things
    - verify the MQ Input node. Make sure you do not turn convert on. It may have the adverse effect of downgrading the message to the qmgr's CCSID before allowing you to parse. Instead use the InputRoot.Properties.CodedCharSetId value.

    - make sure that the message content and the CCSID for the message content match on the input message.


Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Question On Encoding
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.