Author |
Message
|
Paul D |
Posted: Mon Sep 19, 2005 11:45 am Post subject: Need best performing ESQL or method to determine if XML Msg |
|
|
 Master
Joined: 16 May 2001 Posts: 200 Location: Green Bay Packer Country
|
We have a flow that is processing a large message (1.88 MB) that could be XML or BLOB. Our initial approach is to read in the message under the BLOB domain and do the following to determine if it's XML:
Code: |
IF SUBSTRING(Root.BLOB.BLOB from 1 for 1) <> '3c'
THEN
<More code here...> |
We do a RCD to XML if that's the case later in the flow. We have found that for large messages, this ESQL is taking about 5.5 sec. It appears that it's loading the whole message into memory to do this. We've also tried:
Code: |
Declare x3c Int POSITION(x'3c' IN Root.BLOB.BLOB FROM 1);
IF x3c > 0 THEN
<More code here...> |
but that appears to perform about the same. Anyone have any good ideas as to how we can determine if our message is XML with a check that will perform better? _________________ Thanks!!!
Paul D |
|
Back to top |
|
 |
Paul D |
Posted: Thu Sep 29, 2005 8:40 am Post subject: |
|
|
 Master
Joined: 16 May 2001 Posts: 200 Location: Green Bay Packer Country
|
Here's the method that we used to solve this issue.
We had been making the mistake of using the BLOB domain and string ESQL commands to determine if we were XML or non-XML. We needed to take the reverse approach and use the XML domain to determine what we had.
Design:
We set the flow MQInput node to XML Domain, we then wired the out terminal to a filter node with the following code:
Code: |
IF (Root.XML.(XML.XmlDecl).(XML.Version)) IS NULL THEN
RETURN TRUE; -- reset to BLOB
ELSE
RETURN FALSE; -- Do not reset to BLOB. Proceed.
END IF;
|
We then wired the true terminal to an RCD node and false to remainder of the flow. This code instead of the string functions under BLOB performed much better. We were testing with a 2 MB message and reduced overall performance in the flow from 5-6 secs down to 0.5 secs or less. When a non-XML message comes in, the MQInput node will create an empty XML tree, but not to fail. The filter check allows us to determine what we really have. Doing the RCD to BLOB gets us access to the message body again. As a side note, we are trying to avoid MRM's in our case, so that's why we don't consider use of these. This is a generic flow that proceses many different messages so that would not be a good design for us. _________________ Thanks!!!
Paul D |
|
Back to top |
|
 |
jefflowrey |
Posted: Thu Sep 29, 2005 8:46 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Using the BLOB domain, I guess it would have to "parse" the entire bitstream, in some respects.
You might even get better performance if you replace the Filter node with a Try/Catch block, and assume you have XML going down the Try flow. If you get thrown out at Catch, then you can assume (for at least some part) that you've been given a BLOB. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Paul D |
Posted: Thu Sep 29, 2005 8:55 am Post subject: |
|
|
 Master
Joined: 16 May 2001 Posts: 200 Location: Green Bay Packer Country
|
We don't get a failure at MQInput set to XML domain for non-XML msgs, just the empty XML tree. I'm not understanding how you could eliminate the filter with a try catch. _________________ Thanks!!!
Paul D |
|
Back to top |
|
 |
jefflowrey |
Posted: Thu Sep 29, 2005 8:59 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
You should get an error thrown if the message is BLOB and you try to access InputRoot.XML.<bla>.
At least, in cases where it is not allowed to use a NULL value...
Maybe I'm thinking wrong. Hrm...  _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Paul D |
Posted: Thu Sep 29, 2005 9:04 am Post subject: |
|
|
 Master
Joined: 16 May 2001 Posts: 200 Location: Green Bay Packer Country
|
We thought that initially also. Run a quick test and you will see what we are describing. We'd want to avoid the error and stick with true/false. Errors and the Exception List are costly to process. This seems to work really well. _________________ Thanks!!!
Paul D |
|
Back to top |
|
 |
brenner |
Posted: Fri Sep 30, 2005 1:31 am Post subject: |
|
|
Newbie
Joined: 22 Oct 2004 Posts: 7 Location: IBM Hursley
|
Best to try:
DECLARE cursor REFERENCE TO Root.XML;
IF LASTMOVE(cursor) THEN -- it is xml !!..
...
ELSE -- it is blob
...
END IF; |
|
Back to top |
|
 |
mgk |
Posted: Fri Sep 30, 2005 4:27 am Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
Hi Paul.
Your second approach seems wrong to me. I would expect that if the first few bytes of the BLOB were not really a valid XML decl then the XML parser (which you created in the MQInput node) would throw an exception when it looked at the first few bytes of your blob. Therefore, I would like to find out a little more about what you are doing.
Could you post the first few (maybe 100) bytes from one of your XML blobs and one of your none XML blobs. Also, can you mention which version / csd of the broker you are using.
I would expect that the first method you tried (looking at a BLOB in ESQL) should perform much better than your figures suggest, and I was wondering what else happens in your flow after your test for XML, to see if that was soaking up some of your time. I would expect that using ESQL to test the BLOB for XML, then using CREATE with PARSE and DOMAIN clauses to make your XML tree should perform well, rather than using an RCD node. Also, what options did you have selected on your RCD node? Are you forcing a complete parse of the XML somehow, so you are not getting the benefits of defered parsing? Furthermore, have you considered using an RFH2 header or an MQMD format to describe your message as a BLOB or as XML?
Regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
Paul D |
Posted: Fri Sep 30, 2005 5:52 am Post subject: |
|
|
 Master
Joined: 16 May 2001 Posts: 200 Location: Green Bay Packer Country
|
We are running Broker v5 CSD#4. We run on AIX. The first 100 Bytes of the XML message are:
Code: |
<?xml version="1.0"?><RoleList nml-version="001"><MsgProtocol/><MsgProtocolVersionNum/><MsgTypeCde/>
|
The first 100 bytes of the non-XML message is:
Code: |
SCIP001NQTester QDemoServer PeopleWhoMakeMoreMoney
|
We first used acctg and stats to find the pain point node, and then user tracing to narrow down to the statement. It was essentially the first line of ESQL that was the issue. That was the BLOB string check we were doing. The rest of the flow time was minor compared to this. It's mainly a routing flow. High performance is critical. _________________ Thanks!!!
Paul D |
|
Back to top |
|
 |
mgk |
Posted: Fri Sep 30, 2005 7:21 am Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
Hi Paul.
I have to warn you that this behaviour does not work in V6 (I have just tried it). I am at a loss to explain why it works for you on V5 CSD4, and I can only assume (for the moment) that you have found a defect that has subsequently been fixed. I will investigate this on V5 next week, but I still think that if you proceed with this, you will end up replying on a behaviour that will be subject to change when you move between versions (and possibly between CSDs) as I do not know yet when this behaviour was introduced or when it was changed between your current level and V6. I have to reiterate that it is incorrect behaviour for the parser to work in the manor you describe.
Regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Sep 30, 2005 7:31 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
mgk -
I think Paul is using neither of the two tests he posted in his first message. He's using the IS NULL test in his second message.
I also hope you are not saying that this IS NULL test will not work in v6 - as that would be very very strange to me.
I think you're saying that the poor performance of the SUBSTRING is the bug, and should be fixed in v6? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
mgk |
Posted: Fri Sep 30, 2005 7:45 am Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
Hi Jeff
First, IS NULL is indeed still there and working fine in V6
If I run the IS NULL test from Pauls second message, changed slightly as follows:
The flow is just MQInputNode ((XML domain) -> ComputeNode -> MQOutputNode.
The ESQL in the compute node is:
Code: |
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
SET OutputRoot.Properties = InputRoot.Properties;
SET OutputRoot.MQMD = InputRoot.MQMD;
IF (InputRoot.XML.(XML.XmlDecl).(XML.Version)) IS NULL THEN
SET OutputRoot.XML.Test.Result = 'IT IS NOT XML';
ELSE
SET OutputRoot.XML.Test.Result = 'IT IS XML';
END IF;
RETURN true;
END; |
With the Following Input Message:
Code: |
<?xml version="1.0" encoding="utf-8"?><A1><B1>1</B1><B1>2</B1><B1>3</B1></A1> |
I get the OutputMessage:
Code: |
<Test><Result>IT IS XML</Result></Test> |
With the InputMessage:
Code: |
<A1><B1>1</B1><B1>2</B1><B1>3</B1></A1> |
I Get the OutputMessage:
Code: |
<Test><Result>IT IS NOT XML</Result></Test> |
With the Input Message:
I get several exception in the event log, the key ones being:
Code: |
( BROKER1.default ) ('blahBlah.blobBlob.Main', '1.287') Error detected whilst executing the SQL statement ''IF InputRoot.XML.(XML.XmlDecl)*:*.(XML.Version)*:* IS NULL THEN... ELSE... END IF;''.
The message broker detected an error whilst executing the given statement. An exception has been thrown to cut short the SQL program.
See the following messages for details of the error. |
and
Code: |
( BROKER1.default ) XML parsing error ('Invalid document structure ') encountered on line 1 column 1 while parsing element 'XML'.
The above error was reported by the generic XML parser.
This message is usually caused by a badly-formed XML message. Check that the XML message being passed in is a well-formed XML message that adheres to the XML specification. Note that the line number and column number quoted above give the position where the parser discovered the problem. The actual error may be earlier in the message. Internal error codes : (186), (''). |
I see these exception as being the expected behaviour of an XML parser that is presented with a none XML message as a bitstream to parse. I am unsure why Paul does not get similar exceptions in his tests.
I am not sure what the problem with the performance of substring is.
Regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Sep 30, 2005 7:56 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
mgk wrote: |
I see these exception as being the expected behaviour of an XML parser that is presented with a none XML message as a bitstream to parse. I am unsure why Paul does not get similar exceptions in his tests. |
Okay.. That wasn't so clear to me from your post, what the expected or unexpected behavior was...
mgk wrote: |
I am not sure what the problem with the performance of substring is. |
Is it unexpected that it takes a while, in your opinion? Does SUBSTRING have to make a copy of it's input first? One would hope that SUBSTRING(Root.BLOB.BLOB from 1 for 1) would be very fast...
Does the BLOB parser have to do something to the input bitstream? Like, do a byte copy from the input bitstream into a new MbElement? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Paul D |
Posted: Mon Oct 03, 2005 12:13 pm Post subject: |
|
|
 Master
Joined: 16 May 2001 Posts: 200 Location: Green Bay Packer Country
|
I've got another update on this. We've done some more testing. I've got to apologize a bit because our implementation is slightly different from what I described previously. The good part is that's almost the same as what I described before and the performance is still far superior to what we were experiencing before:
New Approach:
We set the flow MQInput node to XML Domain, we then wired the out terminal to a try catch node. The try terminal is wired to a filter node with the following code:
Code: |
IF (Root.XML.(XML.XmlDecl).(XML.Version)) IS NOT NULL THEN
RETURN TRUE; -- XML message. Do not reset to BLOB.
END IF;
|
The catch terminal is wired to a RCD node set to convert back to BLOB. The output from the RCD is wired to the next node in the flow (Compute1). The true terminal from the filter node is wired next node in the flow (Compute1). This is the point where things merge together again and the flow continues. For errors after this, we ignore this "outer" error on non-XML requests in the exception list as we ovbiously know what it's from.
You can see by this that the ESQL code really will throw an exception when the non-XML request is processed. The try catch intercepts and does the RCD to get back to BLOB and we can process forward. We can check the domain in Compute1 and going forward to determine which domain the request belongs to.
Our first tests on Windows were reacting a bit differently, but we were using the visual debugger and maybe things where not acting the same as on AIX. We didn't go back to retest all the scenario's at this point. We do feel that this is is the correct way to proceed though and it works the same on all platforms.
It appears to us that we have all the bases covered on this now. Feel free to comment if you can see anywhere that we are missing something. _________________ Thanks!!!
Paul D |
|
Back to top |
|
 |
Paul D |
Posted: Wed Dec 28, 2005 10:26 am Post subject: |
|
|
 Master
Joined: 16 May 2001 Posts: 200 Location: Green Bay Packer Country
|
One final update. We were way off base with our testing. We ended up determining that if a User Trace was used, then the 5 second delay and full parse was done. If no user trace was on, then performance was fine with the original coding. The key thing we learned here is that doing a user trace with large messages will substanially slow down processing. Even through you just get the very first part of the BLOB body showing up in the trace, behind the scenes it appears to parse out the entire string. I think that while tracing can be helpful in getting low level performance numbers, you need to watch out for this kind of stuff. I'd use acctg/stats first, then drill down into user tracing keeping things like this in mind. _________________ Thanks!!!
Paul D |
|
Back to top |
|
 |
|