Author |
Message
|
shwetabh WMB |
Posted: Tue Aug 09, 2016 12:05 pm Post subject: Parser Architcture |
|
|
Novice
Joined: 15 Jul 2016 Posts: 23
|
Hi,
I wished to know the internal working of parsers used in IIB.How parsing and serialisation work.Is there any document which gives the internal working of parsers? |
|
Back to top |
|
 |
Vitor |
Posted: Tue Aug 09, 2016 12:36 pm Post subject: Re: Parser Architcture |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
shwetabh WMB wrote: |
I wished to know the internal working of parsers used in IIB. |
Why? What possible use could this information be put to? What about your development requires this?
shwetabh WMB wrote: |
Is there any document which gives the internal working of parsers? |
The working of the different parser timings (On Demand / Complete / Immediate) are documented as are the effects of different data modeling strategies on parser performance.
I would imagine anything else would be IBM Intellectual Property. Raise a PMR and see how many Non-Disclosure Agreements you need to sign before they tell you.
But I still doubt the information would be of any practical value to you. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
shwetabh WMB |
Posted: Wed Aug 10, 2016 3:03 am Post subject: |
|
|
Novice
Joined: 15 Jul 2016 Posts: 23
|
Hi,
I wished to learn the internal working of Parser. Reason being the issue we faced in PROD environment.XMLNSC parser dividing CData section .I intermittently.Issue is raised for same in this forum:-
http://www.mqseries.net/phpBB2/viewtopic.php?t=72642&highlight=cdata
We raised PMR for same and we got the answer stating XLXP parsing engine used by XMLNSC parser divides data based on some character or based on the memory buffer.
Frankily anyone will just use XMLNSC parser without knowing the impact it can cause on message.I was not knowing about xlxp scanning engine before this issue came in.javascript:emoticon(' ')
It will surely help in design and development if we have the deep understanding. |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Aug 10, 2016 3:58 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
shwetabh WMB wrote: |
Hi,
It will surely help in design and development if we have the deep understanding. |
Well... having written a couple of parsers in my 40+ years of software development, it was a great relief to get the XMLNS/XMLNSC parsers in this product.
For me (and I suspect the majority of experts here) we don't need to know the internal workings of a parser. Most developers wouldn't understand them anyway. Just like the internal workings of compilers (wrote one of those as well 30+ years ago).
I don't feel the need and have never felt the need to dig deep inside the XMLNS/XMLNSC parser.
If you don't like how they work, you are more than free to develop your own parser. Good luck with that though. It is not a simple task. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Aug 10, 2016 4:01 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
If you have a problem with a CDATA section, then it's almost certainly an issue with the *contents* of the CDATA section.
Not with how the parser is handling the XML message. The XMLNSC parser is a fully compliant XML parser. This means it will handle CDATA sections according to the XML specifications. Regardless of how it is implemented internally.
You need a deep knowledge of the XML specification, particularly around CDATA sections, and deep knowledge of the internals of the XMLNSC parser is almost useless.
If you have found - and you did - that your use of CDATA sections cause the flow to be fragile - it's because CDATA sections can be fragile.
There's almost never a good reason to use them. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
Vitor |
Posted: Wed Aug 10, 2016 5:05 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
shwetabh WMB wrote: |
It will surely help in design and development if we have the deep understanding. |
No, it won't. As indicated on your other thread, the problem is you don't have a deep understanding of XML CDATA sections and the problems they cause. You certainly don't understand that CDATA sections work the way the XML specification says they work (and which all compliant parsers follow) and not the way you wish they'd work.
Certainly any "deep understanding" of the parser in this instance will most likely bring you back to the same point you are now. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
shwetabh WMB |
Posted: Wed Aug 10, 2016 5:14 am Post subject: |
|
|
Novice
Joined: 15 Jul 2016 Posts: 23
|
Thats true. we need not know it if we are not having issue:)
if we say XML message is incorrect or CData is not used properly,it will be great help if you can help me with the reason .I will share details of my xml message and message flow.Same CData section value was working fine if we use in small xml message but not in complex xml message.
+
Multiple CData section is just representation...It does not mean CData use is wrong. We can fetch value using FIELDVALUE or storing it in variable.It will give the correct value.
But if someone stores the Input message (InputRoot) in database, it will have multiple CData value +
If code does SET OutputRoot=InputRoot ,message will propagate as multiple CData.It is upto downstream system to take proper value which may throw error if not aware of situation.
Though I shared same with IBM .they were able to reproduce same and reverted saying it is done by scanning engine to optimise the process.They would have come with wrong XML with CData section if xml had issue.Instead they reverted with reason of making multiple CData section.
My only concern in this post was to know if any doc available online as our PROD transactions failed.
 |
|
Back to top |
|
 |
Vitor |
Posted: Wed Aug 10, 2016 5:38 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
shwetabh WMB wrote: |
if we say XML message is incorrect or CData is not used properly,it will be great help if you can help me with the reason |
It's the one alluded to in this thread and the previous one.
shwetabh WMB wrote: |
Though I shared same with IBM .they were able to reproduce same and reverted saying it is done by scanning engine to optimise the process.They would have come with wrong XML with CData section if xml had issue.Instead they reverted with reason of making multiple CData section. |
The problem remains that you don't understand CData. You think it's an embedded string of unlimited length within an XML document. It's not. You think you can pass it as a string and manipulate it as a string, when that's not a good way to handle embedded XML data under optimum conditions.
shwetabh WMB wrote: |
My only concern in this post was to know if any doc available online as our PROD transactions failed. |
And again, any doc will lead you back to exactly where you are post-PMR. With a flawed design based on flawed understanding. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Aug 10, 2016 5:47 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
shwetabh WMB wrote: |
They would have come with wrong XML with CData section if xml had issue.Instead they reverted with reason of making multiple CData section. |
Yes? That means your XML document is bad. Did you see if you had two CData sections under the same element?
Did you see if the contents of your CDATA section had a CDATA Section of it's own?
Did you compare the message that didn't work to a message that did work? _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
timber |
Posted: Wed Aug 10, 2016 2:28 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
The facts about the CDATA section are:
- There is exactly one CDATA section in the OP's input XML.
- The CDATA section is valid (edited) according to the XML specification otherwise IBM support would have called it out.
- The CDATA section is almost certainly quite large
The facts about the IIB XMLNSC parser are:
- it uses an XML parsing engine called 'XLXP' internally
- that parsing engine *chooses* not to create a single, huge node in the message tree when presented with a very large CDATA section. Instead, it chooses to create a sequence of smaller CDATA sections.
The facts that shwetabh WMB needs to accept are:
- An XML parser is free to make choices like this when building its DOM tree.
The DOM (message) tree created by XMLNSC is therefore valid
- If this message tree has broken a production system then that's unfortunate but it is not IBM's fault.
- IIB is not open-source software. It is unreasonable to expect IBM to reveal source code, or the internal architecture of any part of the product.
I agree with comments in the other thread from mqjeff and Vitor; the database table should not contain the CDATA tags. It should contain a single string value without the CDATA tags. The message flow that writes to that table should use FIELDVALUE to get that string value.
@shwetabh WMB: Please explain why this proposed solution is not acceptable. |
|
Back to top |
|
 |
Craig B |
Posted: Tue Sep 20, 2016 4:47 am Post subject: |
|
|
Partisan
Joined: 18 Jun 2003 Posts: 316 Location: UK
|
You can force the XMLNSC parser to always present CData as a single value (rather than splitting it up into multiple value elements) by validating against a schema for your message. The parser has to have the complete value in one field to validate it as a single value. _________________ Regards
Craig |
|
Back to top |
|
 |
timber |
Posted: Tue Sep 20, 2016 7:31 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
You can force the XMLNSC parser to always present CData as a single value |
That's news to me, but good news for the OP and anybody else with the same problem. Well worth adding a note to the Knowledge Center about this, I think.
Quote: |
The parser has to have the complete value in one field to validate it as a single value. |
Well, the parser *could* internally check the value against the facets and still report the CData section as a series of smaller chunks of text. But apparently it doesn't, which at least provides a get-out clause for this user. |
|
Back to top |
|
 |
Craig B |
Posted: Tue Sep 20, 2016 8:07 am Post subject: |
|
|
Partisan
Joined: 18 Jun 2003 Posts: 316 Location: UK
|
@Timber has just reminded me that you also have to have the "Build tree using XML schema data types" option ticked. This forces IIB to use a Doc handler that wants to set the data types in the message tree that is being constructed. Because a datatype needs to be set for a single field, IIB has to accumulate the data into a single value. Thanks for the memory jog.
 _________________ Regards
Craig |
|
Back to top |
|
 |
|