Author |
Message
|
Vitor |
Posted: Wed Feb 04, 2009 4:33 am Post subject: Large volumes in Transmission Queue |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Background
There are a series of flows which process persistent messages through broker. At intervals a message is written outside syncpoint for audit purposes to a remote queue. Messages are typically small, 20k peaking at 1Mb.
Problem
On Monday additional flows for a new business function were introduced. These follow the pattern of existing flows, but use request/reply via MQGet nodes rather more than has been the case. These flows also produce more auditing than previously (7 messages from a typical flow rather than 4) due to business requirements.
Yesterday we noted that the xmitq to the "audit" queue manager was building up to a depth of thousands. The channel status was running but message flow across the channel was handfuls a minute. Monitoring showed that this matched the dequeue rate of the xmitq. A decision was taken to clear these non-critical messages off with qload. The channel was stopped and a qload attempted, but this too was only able to pull handfuls of messages a minute from the queue, much slower than is the expereience here.
More from panic than reason (I was out of the office yesterday) the execution groups were shut down. The qload then went "ping" and dumped 14,000 messages into a file in 17 seconds.
Normal serivce was then resumed, until these new flows were restarted, when the xmitq started to build again. Turn them off, "ping", turn them on, sloooowwwww.......
There's a PMR open on this but I thought I'd ask the general community. Both WMQ & WMB run on Solaris and are patched to the latest versions.
No unexpected qmgr log messages, no unexpected broker log messages, no FDC files.
Suggestions welcomed. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Feb 04, 2009 4:36 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Edit: Oh,right. You answered that. Coffee hasn't kicked in.
Kernel tuning on the Solaris machine performed according to Quick Beginnings, particularly with qmgr requirements in mind? |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 04, 2009 4:39 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
Kernel tuning on the Solaris machine performed according to Quick Beginnings, particularly with qmgr requirements in mind? |
Yep, and the machine is the existing production box that's been doing it's broker thing for years now. Only new items are these recent flows.
Obviously flow analysis and re-design are options, but what needs to be re-designed out? Or in? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Feb 04, 2009 4:43 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
If it's an exhisting box that's been doing it's broker thing for years, then you likely haven't retuned the kernel to take into account the new volume.
Also, maybe it's a good time to price out modern hardware..
Secondly, you should look at file i/o operations and log file sizing. It may be useful to rebuild the qmgr with larger log files that are on a separate physical volume (or at least have a separate i/o write path) from the q files. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 04, 2009 4:47 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
Also, maybe it's a good time to price out modern hardware..  |
Did you notice a gurgling noise recently? As the Western economic system melted and trickled down the drain? There's barely enough budget for essential expendeture, like the coffee machine supplies and my invoice.
mqjeff wrote: |
Secondly, you should look at file i/o operations and log file sizing. It may be useful to rebuild the qmgr with larger log files that are on a separate physical volume (or at least have a separate i/o write path) from the q files. |
AFAIK the logs are on a separate device, but I'll get them to check. I'll also enquire into sizing. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 04, 2009 5:35 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
A small amount of additional information. We've proved that the enqueue rate for the queue is fairly constant, and is matched by the dequeue rate until an apparently random point in time when the dequeue drops to nothing. Enqueue remains constant (i.e. it's not a huge number of messages suddenly being dumped onto the queue).
We've also seen other, unconnected, less heavily loaded queues drop to near-zero dequeue at about the same time. Is there a single queue manager process that could get into trouble and globally affect dequeue?
Yes, I know, but until we get a response on the PMR I'll ask here. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Feb 04, 2009 7:00 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
IIRC there was an article on performance a while back that specified that in a high volume environment you could get better throughput if all the processes touching the queue were syncpointed. Mixing syncpoint and no syncpoint will adversely affect your performance. It may be better to send your audit message to some local queue where a flow picks it up under syncpoint before sending it to its ultimate destination...
Although this was more intended for dequeueing processes, it would be interesting to test it on the enqueueing side as well.
Also are you sure you are not hitting for some reason a pipe problem (batch size, bandwidth, noise on the line) that would dramaticaly slow your channel throughput rate? Any encryption on the channel?
Hope it helps.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 04, 2009 7:11 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
The catch there is that, for better or worse, the audit is supposed to record what actually happened, so has to be written outside syncpoint in case there's an error.
I know that, logicially, a rolled back transaction didn't happen, but the business insist from inside a trout-proof bunker built before I arrived. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Feb 04, 2009 7:13 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
So you are saying that when the new flows are active, dequeue rates across multiple queues on the QM, even unrelated queues, drops? And when the flows are stopped, the rates return to normal? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Feb 04, 2009 7:15 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Vitor wrote: |
The catch there is that, for better or worse, the audit is supposed to record what actually happened, so has to be written outside syncpoint in case there's an error.
I know that, logicially, a rolled back transaction didn't happen, but the business insist from inside a trout-proof bunker built before I arrived. |
Sure but don't write it to the remote qmgr. Write it to the local queue outside of syncpoint and have a flow pick it up to send it to the remote qmgr under syncpoint... no worries... personally I am more inclined to look for a pipe problem... (the most obvious being destination queue full messages going to the DLQ with 2053) but then that's just me...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 04, 2009 7:20 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
fjb_saper wrote: |
Vitor wrote: |
The catch there is that, for better or worse, the audit is supposed to record what actually happened, so has to be written outside syncpoint in case there's an error.
I know that, logicially, a rolled back transaction didn't happen, but the business insist from inside a trout-proof bunker built before I arrived. |
Sure but don't write it to the remote qmgr. Write it to the local queue outside of syncpoint and have a flow pick it up to send it to the remote qmgr under syncpoint... no worries...  |
Ah, it's a scythe...... (Blackadder II)
Light dawns in my feeble and over-caffinated brain.
No chance you can dig up a link to that article? This will mean a production change, and anything I can wrap round the trout to increase the weight will help....? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 04, 2009 7:22 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
PeterPotkay wrote: |
So you are saying that when the new flows are active, dequeue rates across multiple queues on the QM, even unrelated queues, drops? And when the flows are stopped, the rates return to normal? |
After an apparently random period of normality, yes. It's a subtle effect as we generate far more audit than we do useful messages, but that seems to be the case. Once dequeue rate falls on this queue (easily noticed by the rapidly increasing depth), dequeue rates drop across the queue manager. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Feb 04, 2009 7:25 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
One thing that may be fun to try is adjusting the batch size on the channel.
smaller batch size == more frequent, smaller transactions on the logs
larger batch size == less frequent, larger transactions on the logs |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 04, 2009 7:28 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
fjb_saper wrote: |
Also are you sure you are not hitting for some reason a pipe problem (batch size, bandwidth, noise on the line) that would dramaticaly slow your channel throughput rate? |
Not that we can determine. The machines in question are (obviously) both internal and on the same subnet. Almost certainly geographically close.
We can't find any correlation between network issues that ties as closely to problem starts/problem stops as the flows do.
fjb_saper wrote: |
Any encryption on the channel? |
Security's not a big thing here; heavy relience on a big, thick firewall. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
exerk |
Posted: Wed Feb 04, 2009 7:36 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
Thicky question time...is it possible to resource throttle on Solaris? And if so, are you hitting the limit allowed by the Admins? It's not used here, so no admins I can ask. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
|