Author |
Message
|
murdeep |
Posted: Wed Nov 14, 2007 10:38 am Post subject: Flow performs slow - is there a better way to handle this? |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
Hello, I need some help from the experts out there.
Here is my situation.
We have to add LF character to records packaged into a message. The message size is 4000000 and is comprised of multiple records, the record length is 200. After each record I need to insert a record delimiter which is specified in the BAR at deploy time as a UDP.
Here is my esql:
Code: |
-- -------------------------------------------------------------
-- Add CRLF characters after each record. Record length is set
-- in bar. InputMessage must be a multiple of record length.
-- -------------------------------------------------------------
SET CRLF = crlfCharacters;
SET intLengthBLOB = LENGTH(InputRoot.BLOB.BLOB);
IF MOD(intLengthBLOB,recordLength) <> 0 THEN
THROW USER EXCEPTION MESSAGE 2951
VALUES('Input message length: ', intLengthBLOB, 'is not a multiple of: ', recordLength);
ELSE
SET totalRecords = intLengthBLOB/recordLength;
SET currentRecord = 0;
WHILE (currentRecord < totalRecords) DO
IF (currentRecord = 0) THEN
SET OutputRoot.BLOB.BLOB = SUBSTRING(InputRoot.BLOB.BLOB FROM (currentRecord*recordLength)+1 FOR recordLength);
ELSE
SET OutputRoot.BLOB.BLOB = OutputRoot.BLOB.BLOB ||
SUBSTRING(InputRoot.BLOB.BLOB FROM (currentRecord*recordLength)+1 FOR recordLength);
END IF;
SET OutputRoot.BLOB.BLOB = OutputRoot.BLOB.BLOB || CRLF;
SET currentRecord = currentRecord + 1;
END WHILE;
END IF; |
This works but is slooooooooooooooow. Each message takes approximately 1 minute elapsed time to be processed. If I add multiple instances then I just increase my throughput accordingly i.e. 2 instances = 2 messages/minute. For each instance I see 1 cpu almost pegged on CPU. My prod box where this will run is a 4-way so the best this will run is 4 msgs/minute unless I can improve the ESQL.
Is there a better way to do this parsing?
Murdeep |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Nov 14, 2007 10:41 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Model it as a very simple CWF structure, with two messages. One that's the input message as a single, repeating, fixed length string. The second as the output message, which is the same with the addition of the CRLF. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
murdeep |
Posted: Wed Nov 14, 2007 10:46 am Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
Jeff, I guess you are implying that the MRM parser will handle this better than substringing. Is this known to be true or are you suggesting it thinking it's worth a try but are unsure if it is indeed better. |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Nov 14, 2007 10:53 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
It should be quick enough to try it, that you can find out if it's going to be faster or not.
There isn't a lot of improvement on your code - maybe putting the ||CRLF inline with the same SET will help a bit.
You may be running into some performance limits with v6 Broker, though. There may not be a lot you can do to help this. You could try building your output blob into Environment instead of OutputRoot - but it's not likely to make a lot of difference.
V6.1 should offer some very good improvements for this, though - due to internal changes. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
murdeep |
Posted: Wed Nov 14, 2007 2:19 pm Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
Ok, made the change as per Jeff's suggestion. Result: performance a little bit better. The substring code against a blob ran for about 95 seconds per message, the MRM parser ran for about 80 seconds per message. Roughly about 16% better. |
|
Back to top |
|
 |
murdeep |
Posted: Wed Nov 14, 2007 3:33 pm Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
Just ran another test. In this test I had the originator of the message that has blocked records in it reduce the number of records so that the message length was only 100K (100000). The broker processed those messages incredibly fast. The entire group of messages 11800 in total was processed by the broker in about 7 minutes. Where as the 295 4M (4000000) messages would have taken close to 450 minutes.
So now I need to determine the BLOB message size threshold where performance drops. This doesn't scale linearly, that's for sure! |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Nov 14, 2007 3:42 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Does the time needed to retrieve the message scale linearly to its size?
I would think not as other factors are involved like log space log size etc...
With the same MQ environment (log space, etc...) my experience has been that you will deal better with more little messages than with fewer huge ones... _________________ MQ & Broker admin |
|
Back to top |
|
 |
murdeep |
Posted: Wed Nov 14, 2007 4:29 pm Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
fjb, MQ is not the problem here, it is clearly WMB. BTW, messages were non persistent.
Perhaps some WMB internals expert can shed some light. I plan on increasing the messages size, measure, repeat until I find the breaking point. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 15, 2007 3:47 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
The problem may be caused by the BLOB's buffer resizing algorithm not being greedy enough. Maybe each time it runs out of allocated space it allocates a little more storage, and then has to copy everything into contiguous storage before continuing. This is only a guess, but if true you might get some gains by building multiple smaller BLOBs and then concatenating them |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 15, 2007 6:45 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Another suggestion ( from colleague at Hursley ):
For your scenario, a JavaCompute node may well give better performance than ESQL. |
|
Back to top |
|
 |
fschofer |
Posted: Thu Nov 15, 2007 6:49 am Post subject: |
|
|
 Knight
Joined: 02 Jul 2001 Posts: 524 Location: Mainz, Germany
|
Hi,
i think the line below is the most expensive one because you are executing it 4000000 / 200 = 20000 times
where OutputRoot.BLOB.BLOB gets bigger each time.
Code: |
SET OutputRoot.BLOB.BLOB = OutputRoot.BLOB.BLOB ||
SUBSTRING(InputRoot.BLOB.BLOB FROM (currentRecord*recordLength)+1 FOR recordLength); |
I would try to insert lets say 100 delimeters and then store the result into the Environment and repeat this 200 times.
Add last step i would then put together the 200 BLOB from Environment to one.
Greetings
Frank |
|
Back to top |
|
 |
kalyanMD |
Posted: Thu Nov 15, 2007 6:55 am Post subject: |
|
|
Novice
Joined: 03 Feb 2005 Posts: 14 Location: London UK
|
We had written a C parser in our case where we had to break part of message in 100 different lines. Performance improvement was around 70%. |
|
Back to top |
|
 |
ADV |
Posted: Thu Nov 15, 2007 8:37 am Post subject: will this work? |
|
|
Apprentice
Joined: 24 Apr 2007 Posts: 44 Location: Boston, MA
|
1)Create a dummy queue (for each record)
2)Pares the incoming message & make it as separate messages (each record as a message) & put it into a dummy queue.
3)Read the dummy queue from top to bottom to add CRLF at the end.
4)Again read queue from top to bottom to concatenate the messages & put it into your destination queue.
5)Clear the dummy queue. |
|
Back to top |
|
 |
murdeep |
Posted: Thu Nov 15, 2007 8:37 am Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
Thanks everyone for the replies. My rudimentry testing shows a noticeable drop off at around a 500K message size.
I think I'll look into the JCN. I'm no Java expert so please help me here.
I suspect the BLOB will be a java Byte data type? and I guess that the parameters recordLength and CRLF characters will need to be passed in via the Environment? Finally since the CRLF characters are passed to me in hex is there an easy way in Java to go from a hex representation to a binary one? |
|
Back to top |
|
 |
ADV |
Posted: Thu Nov 15, 2007 8:46 am Post subject: |
|
|
Apprentice
Joined: 24 Apr 2007 Posts: 44 Location: Boston, MA
|
|
Back to top |
|
 |
|