Author |
Message
|
jgoldberg |
Posted: Fri Oct 14, 2011 2:41 pm Post subject: aggregate flow performance |
|
|
Newbie
Joined: 14 Mar 2011 Posts: 9
|
I'm targeting aggregate performance. My flow consists of a fanOut/fanIn wrapped by http. I have 5 flows:
httpInput -> compute -> mqOutput(q1)
mqInput(q1) -> compute ->
aggregate Control -> compute(propagate) -> mq output(q2) -> aggregate request
mqInput(q2) -> compute -> mqReply(q3) (compute fills dummy data for performance testing, real-world it will do http requests)
mqInput(q3) -> aggregate reply -> compute -> mqreply(q4)
mqinput(q4) -> compute -> http reply
For a 3 agg message request my flow does about 260 req/sec.
For a 13 agg message request my flow does about 110 req/sec.
CPU consumed by the aggregate reply flow is extraordinary, and memory consumption is extraordinary. It performs relatively well when doing 3-5 aggregate requests, but my customer wants to do more (5-20) with < 0.5s response times. What can I do to improve the performance of the aggregate reply flow? All flows are in separate execution groups and the fan-in is uses 6x cpu compared to any other flow.
Broker jvm heap limit raised to 512MB, non-persistent messages, setup a configurable service for the aggregation, set my commondata element to opaque so it doesn't get parsed, using XMLNSC, mqinput uses its own thread pool, compute is only passing the message. Any other tips to improve cpu/memory usage?
Code: |
CREATE COMPUTE MODULE aggregateFanIn_Compute
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
DECLARE repliesIn REFERENCE TO InputRoot.ComIbmAggregateReplyBody.*[1];
CREATE LASTCHILD OF OutputRoot DOMAIN 'MQMD';
SET OutputRoot.MQMD.CodedCharSetId = 1208;
CREATE LASTCHILD OF OutputRoot DOMAIN 'XMLNSC';
SET OutputRoot.XMLNSC.(XMLNSC.XmlDeclaration)*.(XMLNSC.Attribute)Encoding = 'UTF-8';
CREATE LASTCHILD OF OutputRoot.XMLNSC NAME 'Data';
SET OutputRoot.XMLNSC.Data.HTTPID = '';
CHECKMQMD: WHILE LASTMOVE(repliesIn) DO
IF FIELDVALUE(repliesIn.XMLNSC.Data.HTTPID) IS NULL THEN
CREATE LASTCHILD OF OutputRoot.XMLNSC.Data FROM repliesIn.XMLNSC.Data.commondata;
ELSE
SET OutputRoot.MQMD.CorrelId = CAST(repliesIn.XMLNSC.Data.MQMD.CorrelId AS BLOB);
SET OutputRoot.MQMD.ReplyToQ = repliesIn.XMLNSC.Data.MQMD.ReplyToQ;
SET OutputRoot.MQMD.ReplyToQMgr = repliesIn.XMLNSC.Data.MQMD.ReplyToQmgr;
SET OutputRoot.XMLNSC.Data.HTTPID = repliesIn.XMLNSC.Data.HTTPID;
END IF;
MOVE repliesIn NEXTSIBLING;
DELETE PREVIOUSSIBLING OF repliesIn;
END WHILE;
RETURN TRUE;
END;
END MODULE;
|
[/code] |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Oct 14, 2011 3:29 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Do some measurements with fan out and fan in in the same execution group. Let us know the results  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jgoldberg |
Posted: Fri Oct 14, 2011 4:32 pm Post subject: |
|
|
Newbie
Joined: 14 Mar 2011 Posts: 9
|
reran with a single eg and again with multiple eg. FYI, I'm testing with `ab -c 100 -n 5000`
same eg:
3 item: 353/sec
13 item: 66/sec
two eg for fanin and two for fanout:
3 item: 420/sec
13 item: 88/sec |
|
Back to top |
|
 |
fjb_saper |
Posted: Sat Oct 15, 2011 1:54 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
jgoldberg wrote: |
reran with a single eg and again with multiple eg. FYI, I'm testing with `ab -c 100 -n 5000`
same eg:
3 item: 353/sec
13 item: 66/sec
two eg for fanin and two for fanout:
3 item: 420/sec
13 item: 88/sec |
obviously, you do have a bottle neck somewhere. I suspect it may be related to the data content. This is shown in the fact that you still need 66 secs for 13 items when fan in and fan out are in the same eg.
How many instances of the flows (fan in & fan out ) are you running. If there is no message affinity try adding additional instances. However your best bet for real performance improvement would be to chase the bottleneck down and fix it.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jgoldberg |
Posted: Tue Oct 18, 2011 1:06 pm Post subject: |
|
|
Newbie
Joined: 14 Mar 2011 Posts: 9
|
The xmlnsc payload is about 200 bytes. I 512 (256*2) instances running of each flow. According to flow statistics, the bottleneck is the AggregateReply node.
I came here looking for advice on fixing it. If you're out of ideas just say so. No need to state the obvious, that I have a problem and need to fix it! I already know that much!  |
|
Back to top |
|
 |
lancelotlinc |
Posted: Tue Oct 18, 2011 1:44 pm Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
The WMB product performs as well as any on the market. It is a tool. A craftsman can use the tool to design and build a high technology solution. If the craftsman designs an implementation that is less than adequate for certain service levels, maybe a different approach is needed.
There are many variations on the fan-out/fan-in theme. I tend to think more outside the box than in, and several senior members of this forum usually not agree with my approaches.
If what you have implemented is not performing adequately, try another variation. Think outside the box. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Last edited by lancelotlinc on Tue Oct 18, 2011 1:52 pm; edited 1 time in total |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Oct 18, 2011 1:45 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
jgoldberg wrote: |
The xmlnsc payload is about 200 bytes. I 512 (256*2) instances running of each flow. According to flow statistics, the bottleneck is the AggregateReply node.
I came here looking for advice on fixing it. If you're out of ideas just say so. No need to state the obvious, that I have a problem and need to fix it! I already know that much!  |
Shame on you for believing the statistics blindly. A closer look should have shown you that most of the time is spent waiting for the messages in the aggregation to arrive...
Well, I believe the problem is not the AggregateReplyNode, but what flows you are calling in between.
We call an aggregation at the rate of over 600 times per minute and see no degradation, however the flows feeding the aggregation have an average response time of 125 ms... or less
The AggregateReplyNode is just as fast as your slowest link in the different aggregation requests. So again find your bottleneck! Check the average, min AND max times of each of the flows feeding your aggregationReply.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jgoldberg |
Posted: Tue Oct 18, 2011 4:15 pm Post subject: |
|
|
Newbie
Joined: 14 Mar 2011 Posts: 9
|
The intermediary flow isn't much more than a noop. it has mqinput -> compute -> mqreply:
Code: |
BEGIN
CREATE NEXTSIBLING OF OutputRoot.Properties DOMAIN 'MQMD';
SET OutputRoot.MQMD = InputRoot.MQMD;
CREATE FIRSTCHILD OF OutputRoot.XMLNSC NAME 'Data';
SET OutputRoot.XMLNSC.Data.Aggregate = Environment.Variables.aggregate;
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field1 = 'field1';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field2 = 'field2';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field3 = 'field3';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field4 = 'field4';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field5 = 'field5';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field6 = 'field6';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field7 = 'field7';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field8 = 'field8';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field9 = 'field9';
SET OutputRoot.XMLNSC.Data.commondata.commondata.Test.field10 = 'field10';
RETURN TRUE;
END;
|
It's running on 1536 instances.
timings from trace nodes indicate processing time of 500us.
In this test I'm doing about 6600 aggregations per minute, about 4000 messages per second going through MQ. Is more than that simply too much to ask for from MQ on my platform? |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Oct 18, 2011 8:28 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Why do you need 1536 instances?
We do 1200 agg / min (peak) with only 30 instances in a cluster of 3 boxes (10 instances per box 1 e.g.). Anything above you start feeling the constraints of the (old AIX P5) hardware.
With that number of instances you'd want to check the broker box.
You may be IO or CPU bound way before you are MQ bound.
You would want to make sure that there is no message affinity between the different requests and perhaps use a cluster (mq + broker) to bear the load.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Oct 23, 2011 3:54 pm Post subject: Re: aggregate flow performance |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
jgoldberg wrote: |
Broker jvm heap limit raised to 512MB ... |
Increasing the jvm heap limit is only useful for message flows that use Java. It also takes memory away from the main memory heap.
jgoldberg wrote: |
my customer wants to do more (5-20) with < 0.5s response times |
If each transaction generates (5-20) fan-out requests and fan-in replies, what is the response time for a single, isolated transaction? How much of that time is spent in each of the request, service, and reply flows?
jgoldberg wrote: |
httpInput -> compute -> mqOutput(q1)
mqInput(q1) -> compute -> aggregate Control -> compute(propagate) -> mq output(q2) -> aggregate request
mqInput(q2) -> compute -> mqReply(q3) (compute fills dummy data for performance testing, real-world it will do http requests)
mqInput(q3) -> aggregate reply -> compute -> mqreply(q4)
mqinput(q4) -> compute -> http reply
For a 3 agg message request my flow does about 260 req/sec.
For a 13 agg message request my flow does about 110 req/sec. |
What is the desired peak throughput? Is it more or less than 110 req/sec? Performance testing at rates higher than the expected peak volume isn't always useful. Adding more threads may increase the throughput (up to a point), but if there is not enough cpu for each available thread, it can also increase the response time for each individual transaction.
During high-volume testing, which queues fill up, (including any SYSTEM.BROKER.AGGR queues)? How full do they get? If the queues start filling up, this can further degrade message flow performance, especially for the reply flow.
jgoldberg wrote: |
CPU consumed by the aggregate reply flow is extraordinary, and memory consumption is extraordinary. ... All flows are in separate execution groups and the fan-in is uses 6x cpu compared to any other flow. |
How extraordinary? The whole point of multi-threading is to increase cpu utilization, and reduce cpu idle time. If the reply flow is processing 5-20 times more messages than the request flow, it's not surprising that it uses substantially more cpu. Memory consumption is also directly proportional to the maximum number of active threads.
If the reply flow can't keep up with the request flow, then slow down the request flow. Start with zero additional instances in the request flow, and monitor performance, resource usage, and queue depths. Gradually increase the number of additional instances and repeat, to find an optimal setting.
If the overall cpu utilization is 100%, or the overall memory usage is too high, adding more hardware may be necessary. |
|
Back to top |
|
 |
|