ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Java / JMS » It is impossible to set correct CCSID

Post new topic  Reply to topic Goto page Previous  1, 2
 It is impossible to set correct CCSID « View previous topic :: View next topic » 
Author Message
Jenum
PostPosted: Thu Sep 27, 2018 6:11 am    Post subject: Reply with quote

Novice

Joined: 13 Nov 2012
Posts: 24

Yes, UTF-8 the most universal characterSet, but some systems work with its own characterSet and message body MUST be encoded with this characterSet.
Also, some systems use MQMD.characterSet field to parse incoming message (for example - IBM Integration Bus).
Finally, assume, that single message must be received twice: first - by system, which work with its own characterSet, and second - by Integration Bus. In this case, my problem is critical, because i can't send message, which body encoded with 866 ccsid and contains this value in characterSet field.
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Sep 27, 2018 7:59 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Jenum wrote:
Yes, UTF-8 the most universal characterSet, but some systems work with its own characterSet and message body MUST be encoded with this characterSet.
Also, some systems use MQMD.characterSet field to parse incoming message (for example - IBM Integration Bus).
Finally, assume, that single message must be received twice: first - by system, which work with its own characterSet, and second - by Integration Bus. In this case, my problem is critical, because i can't send message, which body encoded with 866 ccsid and contains this value in characterSet field.


For the record and the benefit of future readers, all character data within IIB are held as UTF-16 to ensure that IIB can handle any kind of character data. They're converted on serialization (as the OP points out) and deserialization (e.g. a FileOutput or MQOutput node).
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
tczielke
PostPosted: Thu Sep 27, 2018 9:47 am    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

I can't say I understand all your requirements, but when I look at your Java code I see you working with a Java String. Just note that if you have your data in a Java String, it is in UTF-16 (CCSID 1200).
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Sun Sep 30, 2018 1:58 pm    Post subject: Re: It is impossible to set correct CCSID Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Jenum wrote:
After some research, i found, that problem in methods MQMessage#writePropertiesRfh2 and MQHeaderList#write (which call in MQDestination#put): first set CCSID=1208 in MQMD and second set similar characterSet in all headers.

So, MQDestination#put calls the other two methods? Does MQMessage.setStringProperty() change the value of MQMessage.characterSet (or MQMessage.format) at all?

Jenum wrote:
Quote:
Setting message.format could help

Sad, but no.

Even so, downstream applications might find it useful.

Jenum wrote:
Quote:
... and writeString() might be easier to use here; it has conversion built in.

No, resulting CCSID will be 1208.

Regardless of the resulting CCSID, does it convert the string correctly? If so, then 'writeString("Some text");' certainly seems a lot simpler than 'write(Charsets.convert("Some text", message.characterSet);'.

Jenum wrote:
So, tczielke is right: it should be PMR to IBM.

While waiting for a response from IBM, maybe try changing the order of statements around. For example, does setting the properties first help?

Code:
MQMessage message = new MQMessage();

message.setStringProperty("first", "1");
message.setStringProperty("second", "2");
message.setStringProperty("third", "3");

message.characterSet = 1251;
message.format = MQConstants.MQFMT_STRING;
message.writeString("Some text");

queue.put(message, pmo);

Or possibly adding one or more well-placed seek() calls, to reposition the message cursor as needed?

tczielke wrote:
Just note that if you have your data in a Java String, it is in UTF-16 (CCSID 1200).

CCSIDs describe physical bytes, not logical characters. A Java String may use UTF-16 code units, but it does not have a CCSID.


Last edited by rekarm01 on Mon Oct 01, 2018 4:14 am; edited 1 time in total
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Sun Sep 30, 2018 7:27 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20695
Location: LI,NY

Note also that the number of CCSIDS allowed in the RFH2 for the key value pairs is very restricted. UTF-8 (ccsid 1208) is one of the allowed ones.

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
tczielke
PostPosted: Tue Oct 02, 2018 10:27 am    Post subject: Re: It is impossible to set correct CCSID Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

rekarm01 wrote:

tczielke wrote:
Just note that if you have your data in a Java String, it is in UTF-16 (CCSID 1200).

CCSIDs describe physical bytes, not logical characters. A Java String may use UTF-16 code units, but it does not have a CCSID.


When I read this comment, it sounds like you are saying that the Java String should be viewed as something that is opaque where the underlying physical byte encoding can not be known for sure. If I am understanding you correctly, that is not the case for the Java String based on everything I have read. The Java String is always phyically internally encoded in bytes in UTF-16. A reason why this understanding is important for the Java developer is that this has data conversion implications in both building the Java String and putting the Java String to a queue.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Tue Oct 09, 2018 4:45 pm    Post subject: Re: It is impossible to set correct CCSID Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

The Unicode Character Encoding Model describes three levels of mappings:
  • A Coded Character Set maps between abstract characters and integers (code points)
  • A Character Encoding Form maps between code points and fixed-width integers (code units)
  • A Character Encoding Scheme maps between code units and serialized bytes
It also describes a Character Map, which combines the above into a single mapping between abstract characters and serialized bytes. IBM uses CCSIDs, and Java uses Charsets, to identify character maps.

The difference between a Java String and a serialized byte string encoded using ccsid=1200, is the lack of a character encoding scheme. UTF-16 code unit char values are not serialized bytes; they don't have a defined byte order, nor any use for a byte order mark. There is no CCSID that describes a mapping between abstract characters and UTF-16 code units.

For character maps, Java Strings represent the abstract characters, not the serialized bytes. Every (non-deprecated) String constructor and method that converts between characters and bytes needs to specify a Charset (character map), to perform the conversion. Even when ccsid=1200.

tczielke wrote:
it sounds like you are saying that the Java String should be viewed as something that is opaque where the underlying physical byte encoding can not be known for sure.

All I was saying before is that CCSIDs don't describe Java Strings. However, it is also true that the Java String class has no public instance variables, so an application cannot directly read or write any of the underlying physical bytes; instead, it has to interact with the available public constructors and methods, which are free to hide important details about the nature of the underlying bytes. That's one of the benefits of encapsulation.

tczielke wrote:
this has data conversion implications in both building the Java String and putting the Java String to a queue.

Character maps may view characters abstractly, but that doesn't mean that the rest of the application has to.


[Edit: Or, a much simpler explanation: Although UTF-16 byte strings are very similar to UTF-16 character strings, the MQ routines that convert between bytes and characters only use the CCSID to describe the bytes, not the characters.]


Last edited by rekarm01 on Tue Oct 23, 2018 4:20 pm; edited 1 time in total
Back to top
View user's profile Send private message
tczielke
PostPosted: Tue Oct 09, 2018 6:11 pm    Post subject: Re: It is impossible to set correct CCSID Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

rekarm01 wrote:
The Unicode Character Encoding Model describes three levels of mappings:
  • A Coded Character Set maps between abstract characters and integers (code points)
  • A Character Encoding Form maps between code points and fixed-width integers (code units)
  • A Character Encoding Scheme maps between code units and serialized bytes
It also describes a Character Map, which combines the above into a single mapping between abstract characters and serialized bytes. IBM uses CCSIDs, and Java uses Charsets, to identify character maps.

The difference between a Java String and a serialized byte string encoded using ccsid=1200, is the lack of a character encoding scheme. UTF-16 code unit char values are not serialized bytes; they don't have a defined byte order, nor any use for a byte order mark. There is no CCSID that describes a mapping between abstract characters and UTF-16 code units.

For character maps, Java Strings represent the abstract characters, not the serialized bytes. Every (non-deprecated) String constructor and method that converts between characters and bytes needs to specify a Charset (character map), to perform the conversion. Even when ccsid=1200.

tczielke wrote:
it sounds like you are saying that the Java String should be viewed as something that is opaque where the underlying physical byte encoding can not be known for sure.

All I was saying before is that CCSIDs don't describe Java Strings. However, it is also true that the Java String class has no public instance variables, so an application cannot directly read or write any of the underlying physical bytes; instead, it has to interact with the available public constructors and methods, which are free to hide important details about the nature of the underlying bytes. That's one of the benefits of encapsulation.

tczielke wrote:
this has data conversion implications in both building the Java String and putting the Java String to a queue.

Character maps may view characters abstractly, but that doesn't mean that the rest of the application has to.


Thanks, that helps clarify for me. I understand the point you are making, but personally I think this is getting to a level of detail that is beyond what an MQ admin/programmer needs to understand regarding code pages. I think it is fine to use language like a Java String being UTF-16 (CCSID 1200), from a practical stand point.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Mon Oct 22, 2018 7:07 pm    Post subject: Re: It is impossible to set correct CCSID Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:
personally I think this is getting to a level of detail that is beyond what an MQ admin/programmer needs to understand regarding code pages.

MQ admins/programmers don't need to know the difference between an IBM "code page" (CPGID) and a "CCSID" either, but does that mean nobody should let them know when they are misusing the terminology?

tczielke wrote:
I think it is fine to use language like a Java String being UTF-16 (CCSID 1200), from a practical stand point.

Without getting too deep then, how is it practical? Java applications cannot actually use it anywhere. In practice.

Instead, applications use other ccsids to read and write strings, real ccsids associated with the physical bytes of an MQ message.

Why not just "UTF-16" Java Strings? Or, if that truly is not enough, then why not add a qualifier, such as "like CCSID 1200", or "CCSID 1200-ish", or some other clue to indicate that it is not a real ccsid?
Back to top
View user's profile Send private message
tczielke
PostPosted: Tue Oct 23, 2018 5:18 am    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

To complicate things further, I don't consider the md.CCSID a real CDRA CCSID either. Examples are how it violates the CDRA CCSID definition for 1200 by having it represent little endian UTF-16 message data and how tools like iconv that don't have to conform to the CDRA are used for MQ data conversion. But this is really getting into the weeds.

Going back to my original statement "Just note that if you have your data in a Java String, it is in UTF-16 (CCSID 1200)", all I was really trying to say there is that a Java String is UTF-16 encoded, and CCSID 1200 maps to UTF-16. But I can be more clearer in the future.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Tue Oct 23, 2018 6:54 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

tczielke wrote:
To complicate things further, I don't consider the md.CCSID a real CDRA CCSID either. Examples are how it violates the CDRA CCSID definition for 1200 ...

Yet another reason not to draw undue attention to CCSID 1200 ...

tczielke wrote:
But I can be more clearer in the future.

I probably could, too. Sometimes simpler is better.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Oct 24, 2018 1:56 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20695
Location: LI,NY

So let's get back to the OPs problem.
The message ccsid should be 866 but he needs the properties.
The properties ccsid is restricted to a certain list, 1208 is allowed.
Setting the properties CCSID should not set the payload's ccsid. However as he is looking at the message ccsid and the RFH2 header he needs to understand that the 1208 in the message ccsid now applies to the RFH2 header.
The RHF2 has 2 ccsid fields. One for the properties and one for the message payload. If the one for the message payload still says 866 then all is as it should be and the OP needs to lookup and understand the header chaining concept.

Hope it helps

_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Sun Oct 28, 2018 1:27 pm    Post subject: Re: It is impossible to set correct CCSID Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Jenum wrote:
So, tczielke is right: it should be PMR to IBM.

Any update on that?

Jenum wrote:
my work around for this problem is using reflection (it is very, very bad solution, but i can't found anything else)

Another workaround is to manually create the MQRFH2 header, and to set any properties the old-fashioned way. For example:

Code:
MQRFH2 rfh2 = new MQRFH2();
rfh2.nextEncoding(CMQC.MQENC_NATIVE);
rfh2.nextCharacterSet(1251);     // MS-WIN Cyrillic
rfh2.nextFormat(CMQC.MQFMT_STRING);
rfh2.setNameValueCCSID(1208);    // UTF-8

rfh2.setFieldValue("usr", "first", "1");
rfh2.setFieldValue("usr", "second", "2");
rfh2.setFieldValue("usr", "third", "3");

rfh2.write(message, message.encoding, message.characterSet);
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2 Page 2 of 2

MQSeries.net Forum Index » IBM MQ Java / JMS » It is impossible to set correct CCSID
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.