ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexIBM MQ API SupportMQ Conversion Question Time - Hexadecimal 'FD'

Post new topicReply to topic
MQ Conversion Question Time - Hexadecimal 'FD' View previous topic :: View next topic
Author Message
PeterPotkay
PostPosted: Tue Nov 25, 2014 7:35 pm Post subject: MQ Conversion Question Time - Hexadecimal 'FD' Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7717

Message is on the queue with CCSID 819 and format MQSTR. The message is there after DataPower complained about an invalid character during parsing, saying that ‘FD’ is not valid. I am not sure yet what the CCSID was of the message when it was sent into DataPower, but DataPower is doing a get with convert and the default CCSID for DataPower is 819, so I suspect the incoming CCSID might have been something other than 819.


I'm using this page as my starting point to look up what this character is:
http://www-03.ibm.com/systems/i/software/globalization/codepages.html


In code page 819, that says ‘FD’ is this: ý

That’s valid or not? I dunno, its valid character apparently, because its described in the CCSID 819 chart in the slot for ‘FD’ Seems valid to me. Is it not valid for XML docs maybe?

When I look at this message on the queue with various tools, I get results all over the place.

• With rfhutilc, it seems to work fine. It shows the four bytes as LTYý and in hex it shows ‘4C5459FD’. Click on the XML button shows the LTYý just fine in the XML doc.
• If I use BMC’s BMTM, it shows LTYý .
• If I tell BMTM to show that message in XML, Internet Explorer pops open and LTY? is displayed.
• If I use MO71, LTY. is displayed., whether I ask MO71 to convert (to 437) or not.

If I make a copy of the MQ message and change the CCSID to 1208, then look at it on the q, all the results are the same as with 819, except MO71 shows the . by default and throws an error “DBCS error” if I ask for conversion to 437.
If I make a copy of the MQ message and change the CCSID to 437, then look at it on the q, all the results are the same as with 819, except BMTM shows LTY² in its default browse view. In this CCSID 437 case, rfhutilc still shows the LTYý, which seems wrong to me. Hexadecimal ‘4D’ in the 437 code page is supposed to be a ² (SUPERSCRIPT TWO), according to this page from IBM http://www-03.ibm.com/systems/resources/systems_i_software_globalization_pdf_cp00437z.pdf So rfhutilc should have displayed a superscript 2 character if I labeled the incoming message as CCSID 437 and threw a ‘FD’ at it.

If I copy the XML view out of rfhutilc (the one where it shows LTYý) and paste it into a notepad file, save it as a .xml file and open it in Internet Explorer, IE throws this error:
An invalid character was found in text content. Error processing resource 'file:///C:/temp/XML_Hexa_FD.xml'. Line 16, Posi...
<doc_attr attr_name="name" attr_value="LTY


Hey, at least IE seems to be barfing on this kinda like DataPower is, finally something consistent. I have reason to believe the message was sent into DataPower with CCSID 437, researching that to confirm.


1. Why are the results all over the place? If the MQ message has hexadecimal ‘FD’ and CCSID 819 or 1208, why does rfhutilc show the ý character while other tools on my Windows PC show all sorts of different things, everything but what the 437 code table says should be shown for ‘FD’, or the ý for the CCSID 819 code page
2. And if I start the message as CCSID 437, why doesn’t anything, not even rfhutilc, just display the ² character and move on?
3. It seems there is no way to send a ý in the 437 code page. OK, I’ll buy that. But shouldn’t it convert the incoming ‘FD’ to a ² and at least display that consistently?
4. Do XML Parsers refuse to deal with ‘FD’?
5. If senders and receivers are going to be trading XML docs in the UTF-8 format, should they just both agree to use CCSID 1208 exclusively on both ends?

I am reallý confused, confused to the second power, or you could say confused².
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Nov 26, 2014 12:31 am Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

Hi Peter,

Most XML parsers assume, if not otherwise specified in your XML declaration, that the content of your bitstream is UTF-8. So what happens if you do send the message in UTF-8?

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Wed Nov 26, 2014 5:23 am Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7717

fjb_saper wrote:
Hi Peter,

Most XML parsers assume, if not otherwise specified in your XML declaration, that the content of your bitstream is UTF-8. So what happens if you do send the message in UTF-8?

Have fun


I have asked the DataPower developer to take a copy of the message, ensure its MQMD CCSID is set to CCSID 1208, no RFH2 headers to remove any variables, and try again. I am betting this will work, at least thru the XML parser. I hope it will. Hexadecimal 'FD' means ý in Unicode, and I have to imagine its a valid character to send inside a UTF-8 XML doc. What this means for downstream systems like our mainframe and windows environments which have no representation of ý is another matter. One thing at a time, first to get over the parser issue where it says 'FD' is not a valid character. Don't understand that.


Then the follow up is why I see such a variety of characters (glyphs) when I look at this message in various tools. I took a copy of that message and changed just the MQMD CCSID so I had a copy with 1208, a copy with 819 and a copy with 437, then used various tools to look at all 3 messages. I don't yet understand why I see such a variety. So far I've gotten the following:
ý
.
ý
²
But have yet to see the ²
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Nov 26, 2014 6:16 am Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

Forget the tool's display of characters and strings and check if you have the right hex for the CCSID in question. If what you are looking for is not there, then you have a problem creating the message.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
rekarm01
PostPosted: Fri Nov 28, 2014 2:17 am Post subject: Re: MQ Conversion Question Time - Hexadecimal 'FD' Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

A character map (aka IBM CCSID, XML encoding, IANA charset, etc.) converts a sequence of characters to a sequence of bytes, and vice versa. Different character maps can convert the same sequence of characters to a different sequence of bytes, or convert the same sequence of bytes to a different sequence of characters. Many character conversion problems are due to one of two issues:
  • the character map used to write a sequence of bytes does not match the character map used to read the sequence of bytes
  • a given character map does not support a mapping for a given sequence of characters or bytes
Generally, the application that writes a sequence of bytes needs to provide the character map it used, (in a message header or xml declaration, for example), so that the application that reads the sequence of bytes can convert it to the original sequence of characters.

PeterPotkay wrote:
Message is on the queue with CCSID 819 and format MQSTR. The message is there after DataPower complained about an invalid character during parsing, saying that 'FD' is not valid. I am not sure yet what the CCSID was of the message when it was sent into DataPower, but DataPower is doing a get with convert and the default CCSID for DataPower is 819, so I suspect the incoming CCSID might have been something other than 819.

The sequence of events here is a bit confusing. DataPower gets a message off a queue with convert option, parses it, complains about invalid character, and then does what with the message? Does it back out the transaction, leaving the original message on the input queue, does it put a possibly modified message on a different queue? Or something else?

What character is the X'FD' byte supposed to represent? A '² ' or a 'ý'? What is the exact error? Is it an MQ error or an XML parsing error?

One possible scenario: the sending application used ccsid=437 to generate an xml message, but put the wrong ccsid in the MQMD (ccsid=819), so what started out as a '² ' ends up as a 'ý'. The get with convert succeeds, but the conversion from ccsid=819 to ccsid=819 has no effect. The xml message is missing an XML declaration, so the parser assumes (incorrectly) that it's UTF-8. X'FD' is not a valid byte sequence for UTF-8, so the XML parser generates an error.

PeterPotkay wrote:
When I look at this message on the queue with various tools, I get results all over the place.

Different tools may use a default character map to display a message (usually as a configurable option or environment setting), rather than use the message headers/declarations. So, they may not display some characters correctly; it's better to examine the hex bytes directly in those cases. Different tools may also choose to substitute invalid, unprintable, or non-ASCII characters with some other characters (such as a '.' or '?') when displaying them, rather than generate an error.

For example: rfhutilc may use default ccsid=819 to display the message, rather than attempting to convert from UTF-8; BMC's BMTM may attempt to convert the message from ccsid=819 to UTF-8 (X'FD' -> X'C3BD', or X'B2' -> X'C2B2'), but display it with default ccsid=819 ('ý', or '²'), or substitute a '?' for problem characters before sending them to IE; MO71 may substitute a '.' when displaying problem characters.

PeterPotkay wrote:
Hexadecimal 'FD' means ý in Unicode, and I have to imagine its a valid character to send inside a UTF-8 XML doc.

Not quite. Hexadecimal 'FD' as an unsigned integer (U+00FD) does mean 'ý' in Unicode, but exactly how the unsigned integer maps to a sequence of bytes depends on the UTF encoding: for UTF-8, it's X'C3BD'; for UTF-16BE, it's X'00FD'; for UTF-16LE, it's X'FD00'; etc. X'FD' is not a valid byte to send in a UTF-8 XML doc.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Fri Nov 28, 2014 6:44 am Post subject: Re: MQ Conversion Question Time - Hexadecimal 'FD' Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7717

rekarm01 wrote:
A DataPower gets a message off a queue with convert option, parses it, complains about invalid character, and then does what with the message? Does it back out the transaction, leaving the original message on the input queue, does it put a possibly modified message on a different queue? Or something else?

Puts a possibly modified message to a different queue, an ERROR queue.


rekarm01 wrote:

What character is the X'FD' byte supposed to represent? A '² ' or a 'ý'?

Good question. I don't have the original message (yet) or access to the original sender to determine what they intended to send versus what they produced as an input message. In conversations I heard "Sometimes the customer copy and pastes data into the app and it picks up the garbage characters." Looking at where the occurrences of this x'FD' occur in the XML, I can't come up with any potential character for x'FD' that would make sense in that position of the data.

rekarm01 wrote:

What is the exact error? Is it an MQ error or an XML parsing error?

Parser error.


The XML doc does have a UTF-8 declaration at the top. Question, if an app is going to produce a UTF-8 XML doc, is it a good practice or even a hard requirement to use CCSID 1208 on the put? Same for the receiver, if they know they are going to be ingesting UTF-8 XML docs, should they specify CCSID 1208 on the MQGET with Convert as a standard in case the doc has non standard ASCIII characters?


Looking at the wikipedia article on UTF-8, the chart shows FD in the red section, with an explanation that says 'FD' is not a valid single byte character. So to me this, along with looking at the data and seeing no logical explanation for any character other than a space being where I see this 'FD', says this is just a garbage in / garbage out situation. In a UTF-8 doc, its not valid to have a single byte x'FD'. The byte before and after in this doc are legitimate plain ASCII characters that complete or start plain english words.

A lot of this is speculation without having the unaltered input message with an explanation from the sender on what they intended to send. I just have the aftermath crime scene to mop up and determine what went wrong.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
rekarm01
PostPosted: Sat Nov 29, 2014 12:30 am Post subject: Re: MQ Conversion Question Time - Hexadecimal 'FD' Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

PeterPotkay wrote:
The XML doc does have a UTF-8 declaration at the top. Question, if an app is going to produce a UTF-8 XML doc, is it a good practice or even a hard requirement to use CCSID 1208 on the put? Same for the receiver, if they know they are going to be ingesting UTF-8 XML docs, should they specify CCSID 1208 on the MQGET with Convert as a standard in case the doc has non standard ASCII characters?

It's a hard requirement that the advertised ccsid/encoding match the actual encoding of the physical bytes, at every hop. If the sending app misrepresents the data, then the receiver cannot reliably interpret it, or fix the message. So, if an app produces a UTF-8 XML doc, then the message header ccsid and XML encoding should say so.

MQGET with Convert does not modify the XML declaration, so unless the receiving app is prepared to do that, it's probably better to MQGET without Convert. DataPower already converts XML to UTF-8 internally, so there's no need for MQ to convert it too.

PeterPotkay wrote:
A lot of this is speculation without having the unaltered input message with an explanation from the sender on what they intended to send. I just have the aftermath crime scene to mop up and determine what went wrong.

That does make it harder to troubleshoot. If it is a garbage in / garbage out problem, then maybe it's time to delegate more responsibility to the sender ...
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Page 1 of 1

MQSeries.net Forum IndexIBM MQ API SupportMQ Conversion Question Time - Hexadecimal 'FD'
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.