ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » IIB Transform Node UTF-8 issue

Post new topic  Reply to topic
 IIB Transform Node UTF-8 issue « View previous topic :: View next topic » 
Author Message
xaviercoble
PostPosted: Tue May 01, 2018 6:48 am    Post subject: IIB Transform Node UTF-8 issue Reply with quote

Newbie

Joined: 01 May 2018
Posts: 3

Hi All,

I ran into an issue where UTF-8 characters like "–" are being tranformed into " ". In another test I verified that the corruption seems to happen before it gets to the Transform Node (or the process of providing input to the node). I did this be writing xpaths to match on the UTF-8 character in question and they were never fired. I wrote a a simple flow to isolate the problem:

IIB 10.0.0.3 on Linux 2.6.32-696.23.1.el6.x86_64

(1)Http Input -> (2)Trace -> (3)XSL Transform -> (4)Reply Node

1) Message Domain is set as XMLNSC
2) {Root} logged to a file
3) Identity transform (with UTF-8 output encoding), Message Domain set as XMLNSC, Character Set set as 1208
4) reply node

What's also interesting is that the Trace Node actually throws an error when printing out the Root. below is the print out until it errors out (maybe this is also a hint about the overall problem :/)

( ['WSRoot' : 0x7fb6b80a3a80]
(0x01000000:Name ):Properties = ( ['WSPROPERTYPARSER' : 0x7fb6b808b810]
(0x03000000:NameValue):MessageSet = '' (CHARACTER)
(0x03000000:NameValue):MessageType = '' (CHARACTER)
(0x03000000:NameValue):MessageFormat = '' (CHARACTER)
(0x03000000:NameValue):Encoding = 546 (INTEGER)
(0x03000000:NameValue):CodedCharSetId = 1208 (INTEGER)
(0x03000000:NameValue):Transactional = FALSE (BOOLEAN)
(0x03000000:NameValue):Persistence = FALSE (BOOLEAN)
(0x03000000:NameValue):CreationTime = GMTTIMESTAMP '2018-05-01 14:15:59.727124' (GMTTIMESTAMP)
(0x03000000:NameValue):ExpirationTime = -1 (INTEGER)
(0x03000000:NameValue):Priority = 0 (INTEGER)
(0x03000000:NameValue):ReplyIdentifier = X'000000000000000000000000000000000000000000000000' (BLOB)
(0x03000000:NameValue):ReplyProtocol = 'SOAP-HTTP' (CHARACTER)
(0x03000000:NameValue):Topic = NULL
(0x03000000:NameValue):ContentType = 'application/xml; charset=UTF-8' (CHARACTER)
(0x03000000:NameValue):IdentitySourceType = '' (CHARACTER)
(0x03000000:NameValue):IdentitySourceToken = '' (CHARACTER)
(0x03000000:NameValue):IdentitySourcePassword = '' (CHARACTER)
(0x03000000:NameValue):IdentitySourceIssuedBy = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedType = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedToken = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedPassword = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedIssuedBy = '' (CHARACTER)
)
(0x01000000:Name ):HTTPInputHeader = **excluded**
(0x01000000:Folder):XMLNSC =

Sample Input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<test>
<foo a="Filing – Response to Information Request" />
</test>

Response:
<SOAP-ENV:Body>
<SOAP-ENV:Fault>
<faultcode>SOAP-ENV:Server</faultcode>
<faultstring>BIP3113E: Exception detected in message flow pies_test_xsl_1208 (integration node panorama)</faultstring>
<faultactor>removed</faultactor>
<detail>
<text>Exception. BIP2230E: Error detected whilst processing a message in node 'pies_test_xsl_1208.Trace'. : /build/S1000_slot1/S1000_P/src/DataFlowEngine/SQLNodeLibrary/ImbTraceNode.cpp: 355: ImbTraceNode::evaluate: ComIbmTraceNode: pies_test_xsl_1208#FCMComposite_1_4
BIP5009E: XML Parsing Errors have occurred. : /build/S1000_slot1/S1000_P/src/MTI/MTIforBroker/GenXmlParser4/ImbXMLNSCParser.cpp: 1037: ImbXMLNSCParser::parseFirstChild: :
BIP5004E: An XML parsing error ''An invalid XML character (Unicode: 0xffffffff) was found in the value of attribute "a".'' occurred on line 3 column 16 when parsing element ''/Root/XMLNSC/test''. Internal error codes are '1521' and '2'. : /build/S1000_slot1/S1000_P/src/MTI/MTIforBroker/GenXmlParser4/ImbXMLNSCDocHandler.cpp: 768: ImbXMLNSCDocHandler::handleParseErrors: ComIbmWSInputNode: pies_test_xsl_1208#FCMComposite_1_2</text>
</detail>
</SOAP-ENV:Fault>
</SOAP-ENV:Body>


Thanks for your help!
Back to top
View user's profile Send private message
timber
PostPosted: Tue May 01, 2018 2:17 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

There is a high probability that your input data is not UTF-8. It is claiming to be UTF-8, but that doesn't prove anything.
You should change the domain to BLOB and log the raw bytes into a file with extension ".xml". Then point your favourite brower at the file and see what happens.
Back to top
View user's profile Send private message
Vitor
PostPosted: Tue May 01, 2018 2:18 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

What code page is the input message in?

I know the encoding says "UTF-8" - every XML document I've ever seen says that, including the ones in EBCDIC and ISO Latin 7.

Also, why exactly are you using an XSLT stylesheet? There are cheaper ways to run one of those than IIB.

I'd also be interested to know how a transformation using IIB native features (ESQL or a Mapping node) reacts to the same XML.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Tue May 01, 2018 6:26 pm    Post subject: Re: IIB Transform Node UTF-8 issue Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

xaviercoble wrote:
I ran into an issue where UTF-8 characters like "–" are being transformed into " ".

Or transformed into something, and then displayed as " ".

xaviercoble wrote:
In another test I verified that the corruption seems to happen before it gets to the Transform Node ...

The error "invalid XML character (Unicode: 0xffffffff)" suggests that the corruption occurs before the HTTP Input node. The input data might have started out as UTF-8, (or not), but somewhere along the way, something also seems to have replaced whatever bytes it had with X'FF' bytes.

xaviercoble wrote:
(1)Http Input -> (2)Trace -> (3)XSL Transform -> (4)Reply Node

1) Message Domain is set as XMLNSC ...

If the Message Domain is set as BLOB instead, then the Trace node could display the actual input bytes that the message flow receives.

xaviercoble wrote:
What's also interesting is that the Trace Node actually throws an error when printing out the Root ...

Yes, because the Trace node relies on the underlying (XMLNSC) parser, and the parser throws an error, due to an invalid Unicode character. The BLOB parser would be much less likely to throw an error.
Back to top
View user's profile Send private message
xaviercoble
PostPosted: Tue May 01, 2018 6:58 pm    Post subject: Reply with quote

Newbie

Joined: 01 May 2018
Posts: 3

Thanks for both of your responses!

I'm not sure how to get the code page for the input message - if I take out the dash, there is no error.

I'm currently using XSLTs to transform to translate any XML to another XML model. I can probably switch it over to ESQL - I'm just a lot more familiar with XSLT and performance impact is minimal.

I went ahead and wrote a BLOB to XMLNSC flow and got the same result (the dash is converted into a square). I'm not sure what that means...I copied the UTF-8 dash from: https://www.fileformat.info/info/unicode/char/2014/browsertest.htm

here is the code for the echo:
Code:

        SET OutputRoot.Properties.CodedCharSetId = 1208;
        SET OutputRoot.Properties.Encoding = 546;
        CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') PARSE(InputRoot.BLOB.BLOB, 546, 1208);


Also... I'm tested these flows on my windows 7 64bit running 10.0.0.6 IIB and it works fine (blob echo, XSLT flows, trace node). Is there any configuration on linux that could cause this issue? That's the only thing I can think of now.
Back to top
View user's profile Send private message
xaviercoble
PostPosted: Tue May 01, 2018 7:21 pm    Post subject: Reply with quote

Newbie

Joined: 01 May 2018
Posts: 3

Thank you all for your help you all pointed me in the right direction.

Quote:
The error "invalid XML character (Unicode: 0xffffffff)" suggests that the corruption occurs before the HTTP Input node. The input data might have started out as UTF-8, (or not), but somewhere along the way, something also seems to have replaced whatever bytes it had with X'FF' bytes.


Yep! For the testing in linux, i was using Soap UI 4.5.1 and there is a section in the Request properties for Encoding and it was blank. It looks like it was using this as the default setting: file.encoding=Cp1252. Once I put it as UTF-8 all the flows worked.

Soap UI lesson learned...

Thanks all
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » IIB Transform Node UTF-8 issue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.