ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need best performing ESQL or method to determine if XML Msg

Post new topic  Reply to topic
 Need best performing ESQL or method to determine if XML Msg « View previous topic :: View next topic » 
Author Message
Paul D
PostPosted: Mon Sep 19, 2005 11:45 am    Post subject: Need best performing ESQL or method to determine if XML Msg Reply with quote

Master

Joined: 16 May 2001
Posts: 200
Location: Green Bay Packer Country

We have a flow that is processing a large message (1.88 MB) that could be XML or BLOB. Our initial approach is to read in the message under the BLOB domain and do the following to determine if it's XML:

Code:
IF  SUBSTRING(Root.BLOB.BLOB from 1 for 1) <> '3c'
THEN
<More code here...>


We do a RCD to XML if that's the case later in the flow. We have found that for large messages, this ESQL is taking about 5.5 sec. It appears that it's loading the whole message into memory to do this. We've also tried:

Code:
Declare x3c Int POSITION(x'3c' IN Root.BLOB.BLOB FROM 1);
      IF x3c > 0 THEN
<More code here...>


but that appears to perform about the same. Anyone have any good ideas as to how we can determine if our message is XML with a check that will perform better?
_________________
Thanks!!!

Paul D
Back to top
View user's profile Send private message Visit poster's website
Paul D
PostPosted: Thu Sep 29, 2005 8:40 am    Post subject: Reply with quote

Master

Joined: 16 May 2001
Posts: 200
Location: Green Bay Packer Country

Here's the method that we used to solve this issue.

We had been making the mistake of using the BLOB domain and string ESQL commands to determine if we were XML or non-XML. We needed to take the reverse approach and use the XML domain to determine what we had.

Design:
We set the flow MQInput node to XML Domain, we then wired the out terminal to a filter node with the following code:
Code:

IF (Root.XML.(XML.XmlDecl).(XML.Version)) IS NULL THEN
      RETURN TRUE; -- reset to BLOB
ELSE
      RETURN FALSE; -- Do not reset to BLOB. Proceed.
END IF;

We then wired the true terminal to an RCD node and false to remainder of the flow. This code instead of the string functions under BLOB performed much better. We were testing with a 2 MB message and reduced overall performance in the flow from 5-6 secs down to 0.5 secs or less. When a non-XML message comes in, the MQInput node will create an empty XML tree, but not to fail. The filter check allows us to determine what we really have. Doing the RCD to BLOB gets us access to the message body again. As a side note, we are trying to avoid MRM's in our case, so that's why we don't consider use of these. This is a generic flow that proceses many different messages so that would not be a good design for us.
_________________
Thanks!!!

Paul D
Back to top
View user's profile Send private message Visit poster's website
jefflowrey
PostPosted: Thu Sep 29, 2005 8:46 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Using the BLOB domain, I guess it would have to "parse" the entire bitstream, in some respects.

You might even get better performance if you replace the Filter node with a Try/Catch block, and assume you have XML going down the Try flow. If you get thrown out at Catch, then you can assume (for at least some part) that you've been given a BLOB.
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
Paul D
PostPosted: Thu Sep 29, 2005 8:55 am    Post subject: Reply with quote

Master

Joined: 16 May 2001
Posts: 200
Location: Green Bay Packer Country

We don't get a failure at MQInput set to XML domain for non-XML msgs, just the empty XML tree. I'm not understanding how you could eliminate the filter with a try catch.
_________________
Thanks!!!

Paul D
Back to top
View user's profile Send private message Visit poster's website
jefflowrey
PostPosted: Thu Sep 29, 2005 8:59 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

You should get an error thrown if the message is BLOB and you try to access InputRoot.XML.<bla>.

At least, in cases where it is not allowed to use a NULL value...

Maybe I'm thinking wrong. Hrm...
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
Paul D
PostPosted: Thu Sep 29, 2005 9:04 am    Post subject: Reply with quote

Master

Joined: 16 May 2001
Posts: 200
Location: Green Bay Packer Country

We thought that initially also. Run a quick test and you will see what we are describing. We'd want to avoid the error and stick with true/false. Errors and the Exception List are costly to process. This seems to work really well.
_________________
Thanks!!!

Paul D
Back to top
View user's profile Send private message Visit poster's website
brenner
PostPosted: Fri Sep 30, 2005 1:31 am    Post subject: Reply with quote

Newbie

Joined: 22 Oct 2004
Posts: 7
Location: IBM Hursley

Best to try:

DECLARE cursor REFERENCE TO Root.XML;

IF LASTMOVE(cursor) THEN -- it is xml !!..
...
ELSE -- it is blob
...
END IF;
Back to top
View user's profile Send private message
mgk
PostPosted: Fri Sep 30, 2005 4:27 am    Post subject: Reply with quote

Padawan

Joined: 31 Jul 2003
Posts: 1642

Hi Paul.

Your second approach seems wrong to me. I would expect that if the first few bytes of the BLOB were not really a valid XML decl then the XML parser (which you created in the MQInput node) would throw an exception when it looked at the first few bytes of your blob. Therefore, I would like to find out a little more about what you are doing.

Could you post the first few (maybe 100) bytes from one of your XML blobs and one of your none XML blobs. Also, can you mention which version / csd of the broker you are using.

I would expect that the first method you tried (looking at a BLOB in ESQL) should perform much better than your figures suggest, and I was wondering what else happens in your flow after your test for XML, to see if that was soaking up some of your time. I would expect that using ESQL to test the BLOB for XML, then using CREATE with PARSE and DOMAIN clauses to make your XML tree should perform well, rather than using an RCD node. Also, what options did you have selected on your RCD node? Are you forcing a complete parse of the XML somehow, so you are not getting the benefits of defered parsing? Furthermore, have you considered using an RFH2 header or an MQMD format to describe your message as a BLOB or as XML?


Regards,
_________________
MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.
Back to top
View user's profile Send private message
Paul D
PostPosted: Fri Sep 30, 2005 5:52 am    Post subject: Reply with quote

Master

Joined: 16 May 2001
Posts: 200
Location: Green Bay Packer Country

We are running Broker v5 CSD#4. We run on AIX. The first 100 Bytes of the XML message are:
Code:

<?xml version="1.0"?><RoleList nml-version="001"><MsgProtocol/><MsgProtocolVersionNum/><MsgTypeCde/>

The first 100 bytes of the non-XML message is:
Code:

SCIP001NQTester             QDemoServer         PeopleWhoMakeMoreMoney                               

We first used acctg and stats to find the pain point node, and then user tracing to narrow down to the statement. It was essentially the first line of ESQL that was the issue. That was the BLOB string check we were doing. The rest of the flow time was minor compared to this. It's mainly a routing flow. High performance is critical.
_________________
Thanks!!!

Paul D
Back to top
View user's profile Send private message Visit poster's website
mgk
PostPosted: Fri Sep 30, 2005 7:21 am    Post subject: Reply with quote

Padawan

Joined: 31 Jul 2003
Posts: 1642

Hi Paul.

I have to warn you that this behaviour does not work in V6 (I have just tried it). I am at a loss to explain why it works for you on V5 CSD4, and I can only assume (for the moment) that you have found a defect that has subsequently been fixed. I will investigate this on V5 next week, but I still think that if you proceed with this, you will end up replying on a behaviour that will be subject to change when you move between versions (and possibly between CSDs) as I do not know yet when this behaviour was introduced or when it was changed between your current level and V6. I have to reiterate that it is incorrect behaviour for the parser to work in the manor you describe.

Regards,
_________________
MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Fri Sep 30, 2005 7:31 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

mgk -
I think Paul is using neither of the two tests he posted in his first message. He's using the IS NULL test in his second message.

I also hope you are not saying that this IS NULL test will not work in v6 - as that would be very very strange to me.

I think you're saying that the poor performance of the SUBSTRING is the bug, and should be fixed in v6?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
mgk
PostPosted: Fri Sep 30, 2005 7:45 am    Post subject: Reply with quote

Padawan

Joined: 31 Jul 2003
Posts: 1642

Hi Jeff

First, IS NULL is indeed still there and working fine in V6

If I run the IS NULL test from Pauls second message, changed slightly as follows:

The flow is just MQInputNode ((XML domain) -> ComputeNode -> MQOutputNode.

The ESQL in the compute node is:

Code:
CREATE FUNCTION Main() RETURNS BOOLEAN
      BEGIN

        SET OutputRoot.Properties = InputRoot.Properties;
        SET OutputRoot.MQMD = InputRoot.MQMD;
       
        IF (InputRoot.XML.(XML.XmlDecl).(XML.Version)) IS NULL THEN
          SET OutputRoot.XML.Test.Result = 'IT IS NOT XML';
        ELSE
         SET OutputRoot.XML.Test.Result = 'IT IS XML';
        END IF;

        RETURN true;
      END;



With the Following Input Message:

Code:
<?xml version="1.0" encoding="utf-8"?><A1><B1>1</B1><B1>2</B1><B1>3</B1></A1>


I get the OutputMessage:
Code:
<Test><Result>IT IS XML</Result></Test>



With the InputMessage:
Code:
<A1><B1>1</B1><B1>2</B1><B1>3</B1></A1>


I Get the OutputMessage:
Code:
<Test><Result>IT IS NOT XML</Result></Test>


With the Input Message:
Code:
SCIP001NQTester


I get several exception in the event log, the key ones being:

Code:
( BROKER1.default )  ('blahBlah.blobBlob.Main', '1.287') Error detected whilst executing the SQL statement ''IF InputRoot.XML.(XML.XmlDecl)*:*.(XML.Version)*:* IS NULL THEN... ELSE... END IF;''.   

The message broker detected an error whilst executing the given statement. An exception has been thrown to cut short the SQL program.   

See the following messages for details of the error.


and

Code:
( BROKER1.default ) XML parsing error ('Invalid document structure ') encountered on line 1 column 1 while parsing element 'XML'.   

The above error was reported by the generic XML parser.   

This message is usually caused by a badly-formed XML message. Check that the XML message being passed in is a well-formed XML message that adheres to the XML specification. Note that the line number and column number quoted above give the position where the parser discovered the problem. The actual error may be earlier in the message. Internal error codes : (186), (''). 


I see these exception as being the expected behaviour of an XML parser that is presented with a none XML message as a bitstream to parse. I am unsure why Paul does not get similar exceptions in his tests.

I am not sure what the problem with the performance of substring is.

Regards,
_________________
MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Fri Sep 30, 2005 7:56 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

mgk wrote:
I see these exception as being the expected behaviour of an XML parser that is presented with a none XML message as a bitstream to parse. I am unsure why Paul does not get similar exceptions in his tests.

Okay.. That wasn't so clear to me from your post, what the expected or unexpected behavior was...

mgk wrote:
I am not sure what the problem with the performance of substring is.

Is it unexpected that it takes a while, in your opinion? Does SUBSTRING have to make a copy of it's input first? One would hope that SUBSTRING(Root.BLOB.BLOB from 1 for 1) would be very fast...

Does the BLOB parser have to do something to the input bitstream? Like, do a byte copy from the input bitstream into a new MbElement?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
Paul D
PostPosted: Mon Oct 03, 2005 12:13 pm    Post subject: Reply with quote

Master

Joined: 16 May 2001
Posts: 200
Location: Green Bay Packer Country

I've got another update on this. We've done some more testing. I've got to apologize a bit because our implementation is slightly different from what I described previously. The good part is that's almost the same as what I described before and the performance is still far superior to what we were experiencing before:

New Approach:
We set the flow MQInput node to XML Domain, we then wired the out terminal to a try catch node. The try terminal is wired to a filter node with the following code:
Code:

IF (Root.XML.(XML.XmlDecl).(XML.Version)) IS NOT NULL THEN
      RETURN TRUE; -- XML message. Do not reset to BLOB.
END IF;   

The catch terminal is wired to a RCD node set to convert back to BLOB. The output from the RCD is wired to the next node in the flow (Compute1). The true terminal from the filter node is wired next node in the flow (Compute1). This is the point where things merge together again and the flow continues. For errors after this, we ignore this "outer" error on non-XML requests in the exception list as we ovbiously know what it's from.

You can see by this that the ESQL code really will throw an exception when the non-XML request is processed. The try catch intercepts and does the RCD to get back to BLOB and we can process forward. We can check the domain in Compute1 and going forward to determine which domain the request belongs to.

Our first tests on Windows were reacting a bit differently, but we were using the visual debugger and maybe things where not acting the same as on AIX. We didn't go back to retest all the scenario's at this point. We do feel that this is is the correct way to proceed though and it works the same on all platforms.

It appears to us that we have all the bases covered on this now. Feel free to comment if you can see anywhere that we are missing something.
_________________
Thanks!!!

Paul D
Back to top
View user's profile Send private message Visit poster's website
Paul D
PostPosted: Wed Dec 28, 2005 10:26 am    Post subject: Reply with quote

Master

Joined: 16 May 2001
Posts: 200
Location: Green Bay Packer Country

One final update. We were way off base with our testing. We ended up determining that if a User Trace was used, then the 5 second delay and full parse was done. If no user trace was on, then performance was fine with the original coding. The key thing we learned here is that doing a user trace with large messages will substanially slow down processing. Even through you just get the very first part of the BLOB body showing up in the trace, behind the scenes it appears to parse out the entire string. I think that while tracing can be helpful in getting low level performance numbers, you need to watch out for this kind of stuff. I'd use acctg/stats first, then drill down into user tracing keeping things like this in mind.
_________________
Thanks!!!

Paul D
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need best performing ESQL or method to determine if XML Msg
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.