MQSeries.net :: View topic - Parser Architcture

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parser Architcture

Parser Architcture

« View previous topic :: View next topic »

Author

Message

shwetabh WMB

Posted: Tue Aug 09, 2016 12:05 pm Post subject: Parser Architcture

Novice

Joined: 15 Jul 2016
Posts: 23

Hi,
I wished to know the internal working of parsers used in IIB.How parsing and serialisation work.Is there any document which gives the internal working of parsers?

Vitor

Posted: Tue Aug 09, 2016 12:36 pm Post subject: Re: Parser Architcture

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

shwetabh WMB wrote:

I wished to know the internal working of parsers used in IIB.

Why? What possible use could this information be put to? What about your development requires this?

shwetabh WMB wrote:

Is there any document which gives the internal working of parsers?

The working of the different parser timings (On Demand / Complete / Immediate) are documented as are the effects of different data modeling strategies on parser performance.

I would imagine anything else would be IBM Intellectual Property. Raise a PMR and see how many Non-Disclosure Agreements you need to sign before they tell you.

But I still doubt the information would be of any practical value to you.
_________________
Honesty is the best policy.
Insanity is the best defence.

shwetabh WMB

Posted: Wed Aug 10, 2016 3:03 am Post subject:

Novice

Joined: 15 Jul 2016
Posts: 23

Hi,
I wished to learn the internal working of Parser. Reason being the issue we faced in PROD environment.XMLNSC parser dividing CData section .I intermittently.Issue is raised for same in this forum:-

http://www.mqseries.net/phpBB2/viewtopic.php?t=72642&highlight=cdata

We raised PMR for same and we got the answer stating XLXP parsing engine used by XMLNSC parser divides data based on some character or based on the memory buffer.

Frankily anyone will just use XMLNSC parser without knowing the impact it can cause on message.I was not knowing about xlxp scanning engine before this issue came in.javascript:emoticon('

')
It will surely help in design and development if we have the deep understanding.

smdavies99

Posted: Wed Aug 10, 2016 3:58 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

shwetabh WMB wrote:

Hi,

It will surely help in design and development if we have the deep understanding.

Well... having written a couple of parsers in my 40+ years of software development, it was a great relief to get the XMLNS/XMLNSC parsers in this product.
For me (and I suspect the majority of experts here) we don't need to know the internal workings of a parser. Most developers wouldn't understand them anyway. Just like the internal workings of compilers (wrote one of those as well 30+ years ago).
I don't feel the need and have never felt the need to dig deep inside the XMLNS/XMLNSC parser.
If you don't like how they work, you are more than free to develop your own parser. Good luck with that though. It is not a simple task.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

mqjeff

Posted: Wed Aug 10, 2016 4:01 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If you have a problem with a CDATA section, then it's almost certainly an issue with the *contents* of the CDATA section.

Not with how the parser is handling the XML message. The XMLNSC parser is a fully compliant XML parser. This means it will handle CDATA sections according to the XML specifications. Regardless of how it is implemented internally.

You need a deep knowledge of the XML specification, particularly around CDATA sections, and deep knowledge of the internals of the XMLNSC parser is almost useless.

If you have found - and you did - that your use of CDATA sections cause the flow to be fragile - it's because CDATA sections can be fragile.

There's almost never a good reason to use them.
_________________
chmod -R ugo-wx /

Vitor

Posted: Wed Aug 10, 2016 5:05 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

shwetabh WMB wrote:

It will surely help in design and development if we have the deep understanding.

No, it won't. As indicated on your other thread, the problem is you don't have a deep understanding of XML CDATA sections and the problems they cause. You certainly don't understand that CDATA sections work the way the XML specification says they work (and which all compliant parsers follow) and not the way you wish they'd work.

Certainly any "deep understanding" of the parser in this instance will most likely bring you back to the same point you are now.
_________________
Honesty is the best policy.
Insanity is the best defence.

shwetabh WMB

Posted: Wed Aug 10, 2016 5:14 am Post subject:

Novice

Joined: 15 Jul 2016
Posts: 23

Thats true. we need not know it if we are not having issue:)

if we say XML message is incorrect or CData is not used properly,it will be great help if you can help me with the reason .I will share details of my xml message and message flow.Same CData section value was working fine if we use in small xml message but not in complex xml message.
+
Multiple CData section is just representation...It does not mean CData use is wrong. We can fetch value using FIELDVALUE or storing it in variable.It will give the correct value.
But if someone stores the Input message (InputRoot) in database, it will have multiple CData value +
If code does SET OutputRoot=InputRoot ,message will propagate as multiple CData.It is upto downstream system to take proper value which may throw error if not aware of situation.

Though I shared same with IBM .they were able to reproduce same and reverted saying it is done by scanning engine to optimise the process.They would have come with wrong XML with CData section if xml had issue.Instead they reverted with reason of making multiple CData section.

My only concern in this post was to know if any doc available online as our PROD transactions failed.

Vitor

Posted: Wed Aug 10, 2016 5:38 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

shwetabh WMB wrote:

if we say XML message is incorrect or CData is not used properly,it will be great help if you can help me with the reason

It's the one alluded to in this thread and the previous one.

shwetabh WMB wrote:

Though I shared same with IBM .they were able to reproduce same and reverted saying it is done by scanning engine to optimise the process.They would have come with wrong XML with CData section if xml had issue.Instead they reverted with reason of making multiple CData section.

The problem remains that you don't understand CData. You think it's an embedded string of unlimited length within an XML document. It's not. You think you can pass it as a string and manipulate it as a string, when that's not a good way to handle embedded XML data under optimum conditions.

shwetabh WMB wrote:

My only concern in this post was to know if any doc available online as our PROD transactions failed.

And again, any doc will lead you back to exactly where you are post-PMR. With a flawed design based on flawed understanding.
_________________
Honesty is the best policy.
Insanity is the best defence.

mqjeff

Posted: Wed Aug 10, 2016 5:47 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

shwetabh WMB wrote:

They would have come with wrong XML with CData section if xml had issue.Instead they reverted with reason of making multiple CData section.

Yes? That means your XML document is bad. Did you see if you had two CData sections under the same element?

Did you see if the contents of your CDATA section had a CDATA Section of it's own?

Did you compare the message that didn't work to a message that did work?
_________________
chmod -R ugo-wx /

timber

Posted: Wed Aug 10, 2016 2:28 pm Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

The facts about the CDATA section are:
- There is exactly one CDATA section in the OP's input XML.
- The CDATA section is valid (edited) according to the XML specification otherwise IBM support would have called it out.
- The CDATA section is almost certainly quite large

The facts about the IIB XMLNSC parser are:
- it uses an XML parsing engine called 'XLXP' internally
- that parsing engine *chooses* not to create a single, huge node in the message tree when presented with a very large CDATA section. Instead, it chooses to create a sequence of smaller CDATA sections.

The facts that shwetabh WMB needs to accept are:
- An XML parser is free to make choices like this when building its DOM tree.
The DOM (message) tree created by XMLNSC is therefore valid
- If this message tree has broken a production system then that's unfortunate but it is not IBM's fault.
- IIB is not open-source software. It is unreasonable to expect IBM to reveal source code, or the internal architecture of any part of the product.

I agree with comments in the other thread from mqjeff and Vitor; the database table should not contain the CDATA tags. It should contain a single string value without the CDATA tags. The message flow that writes to that table should use FIELDVALUE to get that string value.
@shwetabh WMB: Please explain why this proposed solution is not acceptable.

Craig B

Posted: Tue Sep 20, 2016 4:47 am Post subject:

Partisan

Joined: 18 Jun 2003
Posts: 316
Location: UK

You can force the XMLNSC parser to always present CData as a single value (rather than splitting it up into multiple value elements) by validating against a schema for your message. The parser has to have the complete value in one field to validate it as a single value.
_________________
Regards
Craig

timber

Posted: Tue Sep 20, 2016 7:31 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

You can force the XMLNSC parser to always present CData as a single value

That's news to me, but good news for the OP and anybody else with the same problem. Well worth adding a note to the Knowledge Center about this, I think.

Quote:

The parser has to have the complete value in one field to validate it as a single value.

Well, the parser *could* internally check the value against the facets and still report the CData section as a series of smaller chunks of text. But apparently it doesn't, which at least provides a get-out clause for this user.

Craig B

Posted: Tue Sep 20, 2016 8:07 am Post subject:

Partisan

Joined: 18 Jun 2003
Posts: 316
Location: UK

@Timber has just reminded me that you also have to have the "Build tree using XML schema data types" option ticked. This forces IIB to use a Doc handler that wants to set the data types in the message tree that is being constructed. Because a datatype needs to be set for a single field, IIB has to accumulate the data into a single value. Thanks for the memory jog.

_________________
Regards
Craig

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parser Architcture

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP