|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
IIB Transform Node UTF-8 issue |
« View previous topic :: View next topic » |
Author |
Message
|
xaviercoble |
Posted: Tue May 01, 2018 6:48 am Post subject: IIB Transform Node UTF-8 issue |
|
|
Newbie
Joined: 01 May 2018 Posts: 3
|
Hi All,
I ran into an issue where UTF-8 characters like "–" are being tranformed into " ". In another test I verified that the corruption seems to happen before it gets to the Transform Node (or the process of providing input to the node). I did this be writing xpaths to match on the UTF-8 character in question and they were never fired. I wrote a a simple flow to isolate the problem:
IIB 10.0.0.3 on Linux 2.6.32-696.23.1.el6.x86_64
(1)Http Input -> (2)Trace -> (3)XSL Transform -> (4)Reply Node
1) Message Domain is set as XMLNSC
2) {Root} logged to a file
3) Identity transform (with UTF-8 output encoding), Message Domain set as XMLNSC, Character Set set as 1208
4) reply node
What's also interesting is that the Trace Node actually throws an error when printing out the Root. below is the print out until it errors out (maybe this is also a hint about the overall problem :/)
( ['WSRoot' : 0x7fb6b80a3a80]
(0x01000000:Name ):Properties = ( ['WSPROPERTYPARSER' : 0x7fb6b808b810]
(0x03000000:NameValue):MessageSet = '' (CHARACTER)
(0x03000000:NameValue):MessageType = '' (CHARACTER)
(0x03000000:NameValue):MessageFormat = '' (CHARACTER)
(0x03000000:NameValue):Encoding = 546 (INTEGER)
(0x03000000:NameValue):CodedCharSetId = 1208 (INTEGER)
(0x03000000:NameValue):Transactional = FALSE (BOOLEAN)
(0x03000000:NameValue):Persistence = FALSE (BOOLEAN)
(0x03000000:NameValue):CreationTime = GMTTIMESTAMP '2018-05-01 14:15:59.727124' (GMTTIMESTAMP)
(0x03000000:NameValue):ExpirationTime = -1 (INTEGER)
(0x03000000:NameValue):Priority = 0 (INTEGER)
(0x03000000:NameValue):ReplyIdentifier = X'000000000000000000000000000000000000000000000000' (BLOB)
(0x03000000:NameValue):ReplyProtocol = 'SOAP-HTTP' (CHARACTER)
(0x03000000:NameValue):Topic = NULL
(0x03000000:NameValue):ContentType = 'application/xml; charset=UTF-8' (CHARACTER)
(0x03000000:NameValue):IdentitySourceType = '' (CHARACTER)
(0x03000000:NameValue):IdentitySourceToken = '' (CHARACTER)
(0x03000000:NameValue):IdentitySourcePassword = '' (CHARACTER)
(0x03000000:NameValue):IdentitySourceIssuedBy = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedType = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedToken = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedPassword = '' (CHARACTER)
(0x03000000:NameValue):IdentityMappedIssuedBy = '' (CHARACTER)
)
(0x01000000:Name ):HTTPInputHeader = **excluded**
(0x01000000:Folder):XMLNSC =
Sample Input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<test>
<foo a="Filing – Response to Information Request" />
</test>
Response:
<SOAP-ENV:Body>
<SOAP-ENV:Fault>
<faultcode>SOAP-ENV:Server</faultcode>
<faultstring>BIP3113E: Exception detected in message flow pies_test_xsl_1208 (integration node panorama)</faultstring>
<faultactor>removed</faultactor>
<detail>
<text>Exception. BIP2230E: Error detected whilst processing a message in node 'pies_test_xsl_1208.Trace'. : /build/S1000_slot1/S1000_P/src/DataFlowEngine/SQLNodeLibrary/ImbTraceNode.cpp: 355: ImbTraceNode::evaluate: ComIbmTraceNode: pies_test_xsl_1208#FCMComposite_1_4
BIP5009E: XML Parsing Errors have occurred. : /build/S1000_slot1/S1000_P/src/MTI/MTIforBroker/GenXmlParser4/ImbXMLNSCParser.cpp: 1037: ImbXMLNSCParser::parseFirstChild: :
BIP5004E: An XML parsing error ''An invalid XML character (Unicode: 0xffffffff) was found in the value of attribute "a".'' occurred on line 3 column 16 when parsing element ''/Root/XMLNSC/test''. Internal error codes are '1521' and '2'. : /build/S1000_slot1/S1000_P/src/MTI/MTIforBroker/GenXmlParser4/ImbXMLNSCDocHandler.cpp: 768: ImbXMLNSCDocHandler::handleParseErrors: ComIbmWSInputNode: pies_test_xsl_1208#FCMComposite_1_2</text>
</detail>
</SOAP-ENV:Fault>
</SOAP-ENV:Body>
Thanks for your help! |
|
Back to top |
|
 |
timber |
Posted: Tue May 01, 2018 2:17 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
There is a high probability that your input data is not UTF-8. It is claiming to be UTF-8, but that doesn't prove anything.
You should change the domain to BLOB and log the raw bytes into a file with extension ".xml". Then point your favourite brower at the file and see what happens. |
|
Back to top |
|
 |
Vitor |
Posted: Tue May 01, 2018 2:18 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
What code page is the input message in?
I know the encoding says "UTF-8" - every XML document I've ever seen says that, including the ones in EBCDIC and ISO Latin 7.
Also, why exactly are you using an XSLT stylesheet? There are cheaper ways to run one of those than IIB.
I'd also be interested to know how a transformation using IIB native features (ESQL or a Mapping node) reacts to the same XML. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue May 01, 2018 6:26 pm Post subject: Re: IIB Transform Node UTF-8 issue |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
xaviercoble wrote: |
I ran into an issue where UTF-8 characters like "–" are being transformed into " ". |
Or transformed into something, and then displayed as " ".
xaviercoble wrote: |
In another test I verified that the corruption seems to happen before it gets to the Transform Node ... |
The error "invalid XML character (Unicode: 0xffffffff)" suggests that the corruption occurs before the HTTP Input node. The input data might have started out as UTF-8, (or not), but somewhere along the way, something also seems to have replaced whatever bytes it had with X'FF' bytes.
xaviercoble wrote: |
(1)Http Input -> (2)Trace -> (3)XSL Transform -> (4)Reply Node
1) Message Domain is set as XMLNSC ... |
If the Message Domain is set as BLOB instead, then the Trace node could display the actual input bytes that the message flow receives.
xaviercoble wrote: |
What's also interesting is that the Trace Node actually throws an error when printing out the Root ... |
Yes, because the Trace node relies on the underlying (XMLNSC) parser, and the parser throws an error, due to an invalid Unicode character. The BLOB parser would be much less likely to throw an error. |
|
Back to top |
|
 |
xaviercoble |
Posted: Tue May 01, 2018 6:58 pm Post subject: |
|
|
Newbie
Joined: 01 May 2018 Posts: 3
|
Thanks for both of your responses!
I'm not sure how to get the code page for the input message - if I take out the dash, there is no error.
I'm currently using XSLTs to transform to translate any XML to another XML model. I can probably switch it over to ESQL - I'm just a lot more familiar with XSLT and performance impact is minimal.
I went ahead and wrote a BLOB to XMLNSC flow and got the same result (the dash is converted into a square). I'm not sure what that means...I copied the UTF-8 dash from: https://www.fileformat.info/info/unicode/char/2014/browsertest.htm
here is the code for the echo:
Code: |
SET OutputRoot.Properties.CodedCharSetId = 1208;
SET OutputRoot.Properties.Encoding = 546;
CREATE LASTCHILD OF OutputRoot DOMAIN('XMLNSC') PARSE(InputRoot.BLOB.BLOB, 546, 1208);
|
Also... I'm tested these flows on my windows 7 64bit running 10.0.0.6 IIB and it works fine (blob echo, XSLT flows, trace node). Is there any configuration on linux that could cause this issue? That's the only thing I can think of now. |
|
Back to top |
|
 |
xaviercoble |
Posted: Tue May 01, 2018 7:21 pm Post subject: |
|
|
Newbie
Joined: 01 May 2018 Posts: 3
|
Thank you all for your help you all pointed me in the right direction.
Quote: |
The error "invalid XML character (Unicode: 0xffffffff)" suggests that the corruption occurs before the HTTP Input node. The input data might have started out as UTF-8, (or not), but somewhere along the way, something also seems to have replaced whatever bytes it had with X'FF' bytes. |
Yep! For the testing in linux, i was using Soap UI 4.5.1 and there is a section in the Request properties for Encoding and it was blank. It looks like it was using this as the default setting: file.encoding=Cp1252. Once I put it as UTF-8 all the flows worked.
Soap UI lesson learned...
Thanks all  |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|