MQSeries.net :: View topic - XMLNSC parser removed x'0D' from character element

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » XMLNSC parser removed x'0D' from character element

XMLNSC parser removed x'0D' from character element

« View previous topic :: View next topic »

Author

Message

angka

Posted: Wed Dec 07, 2011 1:30 am Post subject: XMLNSC parser removed x'0D' from character element

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

This is the input data in blob:
3C4D6573 73616765 3E0D0A3C 4F757470
75744461 74613E0D 0A3C4F75 74707574
5061796C 6F61643E 0D0A0D0A 3C2F4F75
74707574 5061796C 6F61643E 0D0A3C2F
4F757470 75744461 74613E0D 0A3C2F4D
65737361 67653E0D 0A

This is the input data in char:
<Message>
<OutputData>
<OutputPayload>....</OutputPayload>
</OutputData>
</Message>

Where the "...." is x'0D0A0D0A'

This is my code:
CREATE LASTCHILD OF Environment.Variables DOMAIN 'XMLNSC' PARSE(InputRoot.BLOB.BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);

SET OutputRoot.BLOB.BLOB = CAST(Environment.Variables.XMLNSC.Message.OutputData.[1] AS BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);

And my Output become only X'0A0A'. Why is this so?

Anyway if i code this way:
CREATE LASTCHILD OF Environment.Variables DOMAIN 'XMLNSC' PARSE(InputRoot.BLOB.BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);
SET OutputRoot.BLOB.BLOB = ASBITSTREAM(Environment.Variables.XMLNSC CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);

I got the X'0D0A' in it. However, this is not what I want.

I just want the output to be the PCDATA in the outputpayload tag. Is there a workaround?

Thanks

Esa

Posted: Wed Dec 07, 2011 2:14 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

Your post lacks some vital information, e.g. Broker version. My workaround assumes you are working with version 6.1 or higher.

Instead of creating an XMLNSC tree into Environment, let the <protocol>Input node parse the message with XMLNSC, but declare OutputPayload as an opaque element in Parser Options tab of the <protocol>Input node. That should leave the field unparsed and thus preserve extra whitespace.

Didn't have time to test this so you have to test yourself what happens when you have some real data in OutputPayload.

kimbert

Posted: Wed Dec 07, 2011 3:11 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

I'm very glad to hear it. Please read this for the reason : http://www.w3.org/TR/2006/REC-xml-20060816/#sec-line-ends
If you want to protect the consecutive linefeed characters then you will need to hide them in a hexBinary or base64-encoded BLOB.

I can believe that XMLNSC does not remove consecutive line feeds on output - if so, then technically that is a defect in XMLNSC.

Esa

Posted: Wed Dec 07, 2011 3:45 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

I had a feeling that even opaque parsing would not help you. What kimbert says confirms it, I think.

Because even the opaque elements are parsed, even if the InfoCenter says the opposite here http://publib.boulder.ibm.com/infocenter/wmbhelp/v7r0m0/index.jsp?topic=%2Fcom.ibm.etools.mft.doc%2Fac20990_.htm&resultof=%22%6f%70%61%71%75%65%22%20%22%6f%70%61%71%75%22%20 (the second last paragraph). If you have invalid XML in your opaque element, you will get an exception.

What opaque parsing does, it does not build an element tree of the opaque element, but puts it in one element as a string value. However, the element contents are still parsed and I strongly suscpect that it will also modify the linebreaks.

So, it seems the only thing you can do is to cast the input blob as character, locate the positions of '<OutputPayload>' and '</OutputPayload>', use SUBSTRING to extract the string between and cast it back to blob. Perhaps you could do it even without the casts (depends on the codepages).

If you already tried the opaque element approach, please tell us what it did with the linebreaks.

angka

Posted: Wed Dec 07, 2011 3:45 am Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

Okay noted. but I tried with CDATA and XMLNSC parser also remove the X'0D'. why?
anyway the reason i need the x'0D0A' is because the XML is actually an ouputcard from a WTX map. I need to preserve the all bytes for this element. any workaround to instruct XMLNSC not to remove it?

Btw, if I set it as opaque element, i will not be able to access it.

Thank you

kimbert

Posted: Wed Dec 07, 2011 3:55 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

It's true that opaque elements do not allow badly-formed XML - but consecutive line feeds do not make the XML badly-formed. I honestly don't know, off the cuff, whether an opaque element will collapse the line feeds but it's worth a try.

Quote:

I tried with CDATA and XMLNSC parser also remove the X'0D'. why?

Because that's what the XML specification requires XML parsers to do. If XMLNSC did not do this ( even within a CDATA section) it would be non-compliant.

btw, you seem to be suffering from a common misunderstanding about CDATA. A CDATA section still needs to obey certain XML rules ( e.g. no illegal characters ), and line feed collapsing does apply to its content. You may think that's unhelpful, and it may well be, but it's what the XML specification says.

mqjeff

Posted: Wed Dec 07, 2011 3:57 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

It's not clear that you understand what you have been told.

XML strictly does not support preservation of line breaks within element content.

You need to base64 encode or otherwise transform the contents of this element into something that does not allow the XMLNSC Parser to recognize the line breaks.

Esa

Posted: Wed Dec 07, 2011 4:03 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

angka wrote:

Okay noted. but I tried with CDATA and XMLNSC parser also remove the X'0D'. why?

If you produced the CDATA element in WTX, it might have worked. I think WTX does not follow the XML spec as tightly as XMLNSC. But I am not telling you to try it.

angka wrote:

Btw, if I set it as opaque element, i will not be able to access it.
Thank you

Yes, you will. Remember it is a character value.

kimbert

Posted: Wed Dec 07, 2011 4:53 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

the XML is actually an ouputcard from a WTX map. I need to preserve the all bytes for this element.

Please explain why it is important to preserve the line feeds? The next well-behaved XML parser in the processing pipeline will collapse those line feeds anyway so I'm struggling to see why it matters.

smdavies99

Posted: Wed Dec 07, 2011 5:15 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

kimbert wrote:

Perhaps there is a PHB that does not understand that fact?
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

rekarm01

Posted: Thu Dec 08, 2011 2:54 am Post subject: Re: XMLNSC parser removed x'0D' from character element

Grand Master

Joined: 25 Jun 2008
Posts: 1415

kimbert wrote:

I can believe that XMLNSC does not remove consecutive line feeds on output - if so, then technically that is a defect in XMLNSC.

Why would that be a defect? The consecutive line feeds occur as character data, (not as mixed data). The XMLNSC parser should (and does) remove the carriage returns from the input message, but why should it collapse the remaining linefeeds?

In any event, the given code seems to be using the BLOB parser, rather than the XMLNSC parser, to generate the output message.

Esa wrote:

I had a feeling that even opaque parsing would not help you.

The XML specification requires that the parser normalizes the end-of-lines on input, before parsing, so opaque parsing probably won't help.

angka wrote:

... but I tried with CDATA ...

... encoded as HexBinary or Base64?

Esa wrote:

If you produced the CDATA element in WTX, it might have worked. I think WTX does not follow the XML spec as tightly as XMLNSC.

The XML spec does not prohibit an XML processor from adding carriage returns to an output message.

angka wrote:

I need to preserve the all bytes for this element. any workaround to instruct XMLNSC not to remove it?

If that's really the case, then, as already suggested, either use the BLOB parser, or use hexBinary- or Base64-encoded CDATA. Another option is for WTX to use character references to represent carriage returns.

Esa

Posted: Thu Dec 08, 2011 6:21 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

If the XML parser does not work the way you like, dont use it:

Esa wrote:

So, it seems the only thing you can do is to cast the input blob as character, locate the positions of '<OutputPayload>' and '</OutputPayload>', use SUBSTRING to extract the string between and cast it back to blob. Perhaps you could do it even without the casts (depends on the codepages).

If you want to convert a blob to xml to be able to select one part of it and convert it back to blob exactly as it was, it is simpler and gives better performance if you just extract the sequence from the blob without parsing it at all.

This is the humble solution to the OP's little problem. Unfortunately I had hidden it the middle of more interesting stuff on opaque parsing. Nobody reads longer posts? Or maybe it is my (well deserved) reputation as a poster of nonsense

mqjeff

Posted: Thu Dec 08, 2011 6:29 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Esa wrote:

Or maybe it is my (well deserved) reputation as a poster of nonsense

Don't be ridiculous.

Esa

Posted: Thu Dec 08, 2011 7:03 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

mqjeff wrote:

Don't be ridiculous.

I'm sorry. Being ridiculous is our national sport.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » XMLNSC parser removed x'0D' from character element

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP