MQSeries.net :: View topic - Removing whitespace in XML element content

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Removing whitespace in XML element content

Goto page 1, 2 Next

Removing whitespace in XML element content

« View previous topic :: View next topic »

Author

Message

goffinf

Posted: Wed Mar 27, 2013 9:16 am Post subject: Removing whitespace in XML element content

Chevalier

Joined: 05 Nov 2005
Posts: 401

version: 6.1.0.10

Thought I'd seen something similar to this recently but search as I might I can't find it so ...

An XML message is received into a flow via an HTTPInput. It is configured to use the XMLNSC domain with On-Demand parsing.

Sometimes we get a message where the content of an element is blank but the start and end tags are on different lines (if you happened to look at them in an editor). For example

Code:

The logical message tree in Broker will show that the content of <baz> is 0a09 (LF + TAB).

When we send this message out of the flow (after mapping to an MRM definition - TDS Fixed Length) it causes that part of the 'record' to appear on a separate line and the software which reads the resulting message barfs on it.

Now obviously I used a simplified example above. The real one has hundreds of fields any of which could have this problem.

Is there a simple and/or efficient way of getting rid of unwanted white-space in element content either when we parse the XML input or the MRM output ?

I did see a post that suggested that switching to XMLNS rather than XMLNSC would do that, but it didn't.

Regards

Fraser.

mqjeff

Posted: Wed Mar 27, 2013 9:22 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

there's a "remove mixed-content" or something like that switch for XMLNSC (maybe it's 'retain mixed content'?).

goffinf

Posted: Wed Mar 27, 2013 9:27 am Post subject:

Chevalier

Joined: 05 Nov 2005
Posts: 401

mqjeff wrote:

there's a "remove mixed-content" or something like that switch for XMLNSC (maybe it's 'retain mixed content'?).

There is, although it's 'Retain mixed content' and it definitely 'unchecked' (i.e. off).

Technically this isn't mixed content as it's inside an element not between them.

Fraser.

mqjeff

Posted: Wed Mar 27, 2013 9:36 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

goffinf wrote:

mqjeff wrote:

there's a "remove mixed-content" or something like that switch for XMLNSC (maybe it's 'retain mixed content'?).

There is, although it's 'Retain mixed content' and it definitely 'unchecked' (i.e. off).

Technically this isn't mixed content as it's inside an element not between them.

Fraser.

Oh, you meant the "baz" element content, not the formatting between the elements.

I think you have to deal with this on a field-by-field basis.

kimbert

Posted: Wed Mar 27, 2013 9:42 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

The white space in the <baz> tag is not mixed content. It is the actual text value of this tag.

You could
- add a whiteSpace facet to the XML Schema simple type that describes <baz>
- switch on validation in the message flow, and ensure that 'Build tree using XML Schema' is enabled on the input node
That should ensure that the whitespace in the XML is replaced by the empty string. Any whitespace within a string value will be collapsed to a single space character.

mqjeff

Posted: Wed Mar 27, 2013 9:49 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

kimbert wrote:

You could
- add a whiteSpace facet to the XML Schema simple type that describes <baz>and every other field that needs to be trimmed
- switch on validation in the message flow, and ensure that 'Build tree using XML Schema' is enabled on the input node
That should ensure that the whitespace in the XML is replaced by the empty string. Any whitespace within a string value will be collapsed to a single space character.

adjusted that for you.

It's probably more correct to adjust the MRM model to ensure that it translates the LF and TABs into spaces, although again this needs to be done on each and every field.

But it's better in the long run to fix the sending application not to prettyprint the XML.

McueMart

Posted: Wed Mar 27, 2013 9:50 am Post subject:

Chevalier

Joined: 29 Nov 2011
Posts: 490
Location: UK...somewhere

Hacky way to do it: Read the message in as a BLOB and do a global replace on the 0a bytes. Then reparse as XMLNSC. Obviously this method isnt completely safe if you are using a multi-byte character set! Also this assumes you don't want linefeeds anywhere!

goffinf

Posted: Wed Mar 27, 2013 10:48 am Post subject:

Chevalier

Joined: 05 Nov 2005
Posts: 401

mqjeff wrote:

It's probably more correct to adjust the MRM model to ensure that it translates the LF and TABs into spaces, although again this needs to be done on each and every field.

Ah, I was hoping for something on the MRM side rather than the input XML.

Not having had much to do with MRM, how do I cause the LFs and TABs to be converted ? (actuallt it would be preferable if they didn't turn into spaces but perhaps NULs ?

mqjeff wrote:

But it's better in the long run to fix the sending application not to prettyprint the XML.

I hear that.

goffinf

Posted: Wed Mar 27, 2013 10:50 am Post subject:

Chevalier

Joined: 05 Nov 2005
Posts: 401

McueMart wrote:

Yes indeed. I had considered that but ... as you said it does feel a bit of a hack and I have had my fingers burnt with multi-byte character encoding before and not keen to repeat. Thanks for the suggestion though.

Fraser.

kimbert

Posted: Thu Mar 28, 2013 1:09 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

It's probably more correct to adjust the MRM model to ensure that it translates the LF and TABs into spaces, although again this needs to be done on each and every field.

Why is that more correct than using a whiteSpace facet? I'm not disagreeing ( not yet, anyway), just curious to know what your reasons are.

goffinf

Posted: Thu Mar 28, 2013 2:38 am Post subject:

Chevalier

Joined: 05 Nov 2005
Posts: 401

kimbert wrote:

Quote:

It's probably more correct to adjust the MRM model to ensure that it translates the LF and TABs into spaces, although again this needs to be done on each and every field.

Why is that more correct than using a whiteSpace facet? I'm not disagreeing ( not yet, anyway), just curious to know what your reasons are.

I know this question was directed at mqjeff, but here's some of my rationale.

I agree both approaches are valid. In fact, when the Dev who's looking at this asked me about it, my first response was that we should be able to effect the way in which element content is normalized by the XML parser to get what we want, I just didn't at that time know how to do so in Broker. So thanks for illuminating that approach.

As to the broader question of whether to apply normalization to the input or when producing the output thru MRM ...

For XML input I prefer to follow what others might recognize as the 'Postels Law' approach (even though that attribution isn't entirely correct). The mantra goes something like this .. 'be liberal in what you accept and conservative in what you emit' (other variations can be found).

In practice this means that whilst I am a supporter of a constraint model that provides benefit for the aspects of the interface that I (or rather 'the business') care about, it is very easy to create a brittle interface that would otherwise reject messages because of limitations in the expressiveness of XSD as a constraint language rather than whether they are 'business processable' or not.

That means that I can tolerate some unexpected data turning up if I don't care about it in my particular context (aka: the 'must ignore unknown' pattern) as well as some data which is 'missing' or perhaps even 'invalid', again, if it it doesn't impact the validity of the parts that I do.

I'm not saying this approach is a model for everyone, but techniques like selective validation (complementary to and/or as a complete replacement for XSD) and a greater ability to evolve the interface in ways that are less likely to introduce breaking change, can be of significant benefit especially when your callers are external trading partners (i.e resources that you don't control and usually prefer not spend any more than necessary). Again, I'm not preaching and there *are* challenges with doing this, but I just making a comment about reality, at least in the integration environment that I work in.

I also suspect that our Service Design team probably doesn't have an XSD in the case I am currently looking at and I'm not at all confident they could come up with one any time soon (but that's a separate story altogether).

So, .... whilst I am attracted to leveraging the XML parser behaviour, I would like to know more about how to remove these extraneous and unwelcome characters in the MRM model definition, another area where my personal knowledge is somewhat lacking.

Regards

Fraser.

mqjeff

Posted: Thu Mar 28, 2013 3:38 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

I agree with everything that Fraser said.

In addition, it's a question of enforcing the rules of the *correct* contract. The MRM contract is entirely separate and entirely different from the XML contract. You shouldn't enforce rules of one contract by changing the other, necessarily.

That is, it may be perfectly reasonable and "okay" for the XML message to contain whitespace characters of various kinds in this field.

But it's not okay for the MRM message to contain anything other than specific characters from a much more restricted character range.

kimbert

Posted: Thu Mar 28, 2013 4:39 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

@goffinf: I get it. It's not the whitespace facet itself that is the problem - it is the fact that you have to switch on schema validation in order to use it.

@mqjeff: I agree - this is primarily a transformation problem. The task is to make the input data safe for the output format. The ESQL solution is fiddly, but is probably the best solution.

goffinf

Posted: Thu Mar 28, 2013 4:56 am Post subject:

Chevalier

Joined: 05 Nov 2005
Posts: 401

kimbert wrote:

Does that mean there isn't anything that can be done in MRM model itself as mqjeff proposed ?

mqjeff wrote:

It's probably more correct to adjust the MRM model to ensure that it translates the LF and TABs into spaces, although again this needs to be done on each and every field.

How would suggest removing the whitespace chars in ESQL ?

mqjeff

Posted: Thu Mar 28, 2013 4:59 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

goffinf wrote:

How would suggest removing the whitespace chars in ESQL ?

That's the bit about accepting using BLOB parser and then doing a REPLACE.

Or you can just do a replace as part of any SET statement that assigns a field to an output field.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Removing whitespace in XML element content

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP