Author |
Message
|
zpat |
Posted: Thu Jan 30, 2014 8:17 am Post subject: Best practice for receiver channel message retry options? |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
We had a lot (50000) persistent messages being sent from QM A (z/OS) to QM B (AIX) and where the destination queue was full (and not being consumed from).
The receiver channel moved the messages to the DLQ on QM B, but very slowly - so that other persistent messages (to other queues) coming over the same channel were being delayed by serious amounts of time.
Presumably this is caused by the channel default message retry count (10) and the default message retry interval (1000 millisecs).
Question 1 - does this retry apply to each message, or to each batch of messages?
Question 2 - what are your recommendations for the receiver channel message retry values to avoid this problem? _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
Michael Dag |
Posted: Thu Jan 30, 2014 8:27 am Post subject: |
|
|
 Jedi Knight
Joined: 13 Jun 2002 Posts: 2607 Location: The Netherlands (Amsterdam)
|
Feels like fixing a symptom of a problem and not the problem itself...
the retry time and count should cater for the queue to be emptied (not complete maybe, but at least some) during that period...
Why was the original queue not being read?
Was this an exeptional burst of extra messages? i.e. why was the original queue not dimensioned to handle these while it was not being read?
just my 2 cents as usual _________________ Michael
MQSystems Facebook page |
|
Back to top |
|
 |
zpat |
Posted: Thu Jan 30, 2014 8:36 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
It's not a perfect world. This is also a non-prod system.
A developer stopped the message flow that consumed the queue, but didn't stop the delivery from upstream, or get the maxdepth increased.
I don't really want channels desperately fighting against nearly full queues - if a queue is full it's generally going to stay that way for some time, as we don't operate near the limit of maxdepth under normal circumstances (in fact we keep well away from reaching maxdepth normally by about 80%).
Our batch size is 50 if that's relevant. I can't see the point of the channel retrying more than once really. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Jan 30, 2014 10:17 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
The intent of message retry is to deal with transitory issues, such as destination queue is briefly full or briefly put-inhibited by an application.
Message retry deals with the current message from the batch that is in need of being MQPUT to the destination or DLQ. As such, if memory serves, the remainder of the batch, and subsequent batches, will be stalled until the current message is dealt with.
I would rather deal with application queue full condition by automating maxdepth so that when the queue reaches 80% of max, max is raised by some comfort-level to avoid message retry altogether. Most automation tools can do this.
I would approach the DLQ in a similar fashion - increasing ceiling-height as needed to avoid the worst-case channel fail condition. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
exerk |
Posted: Thu Jan 30, 2014 10:51 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
zpat wrote: |
...I don't really want channels desperately fighting against nearly full queues - if a queue is full it's generally going to stay that way for some time, as we don't operate near the limit of maxdepth under normal circumstances (in fact we keep well away from reaching maxdepth normally by about 80%)... |
So what about using class-of-service channels? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Jan 30, 2014 1:23 pm Post subject: Re: Best practice for receiver channel message retry options |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
zpat wrote: |
Question 1 - does this retry apply to each message, or to each batch of messages?
Question 2 - what are your recommendations for the receiver channel message retry values to avoid this problem? |
Q1 - the retry applies to each message the channel cannot put.
Q2 we go with 1 and 1000. Wait one second and try again. If that doesn't work, dump it to the DLQ.
So this throttles how fast the DLQ fills up, but at the expense of pausing the channel for one second for each message. This can be quite impactful.
If there are a lot of messages in the XMITQ addressed to this full queue, and this full queue is not being processed at all, yeah, its going to be a drag. And if its a shared channel used by multiple apps the innocent messages get stuck in the traffic jam.
I guess you could set it to 0, and have the channel off load the messages to the DLQ ASAP. Nothing wrong with that, if that's what you want. DLQ might fill up FAST, though. See my other post about that.
On B2B channels from other companies I jack these values up. If they start sending something that needs to go to a DLQ, I want that to be real super slow, to protect against any Denial of Service type attacks. Although I may revisit this and just tell these channels to not use a DLQ at all, now possible in the latest version of MQ. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Jan 30, 2014 1:25 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
bruce2359 wrote: |
I would rather deal with application queue full condition by automating maxdepth so that when the queue reaches 80% of max, max is raised by some comfort-level to avoid message retry altogether. Most automation tools can do this.
|
Just make the queue as big as possible to begin with if all you are going to do is automatically make it bigger. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Jan 30, 2014 1:40 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
PeterPotkay wrote: |
bruce2359 wrote: |
I would rather deal with application queue full condition by automating maxdepth so that when the queue reaches 80% of max, max is raised by some comfort-level to avoid message retry altogether. Most automation tools can do this.
|
Just make the queue as big as possible to begin with if all you are going to do is automatically make it bigger. |
Well, yes, but the automation will allow maxdepth to be raised just-in-time, should it be needed from its 'big as possible' value, should the app or its usage change without the developers notifying us. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Jan 30, 2014 2:18 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
bruce2359 wrote: |
Well, yes, but the automation will allow maxdepth to be raised just-in-time, should it be needed from its 'big as possible' value, should the app or its usage change without the developers notifying us. |
If you can automatically make it bigger, technically it wasn't "big as possible" to begin with.  _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
zpat |
Posted: Fri Jan 31, 2014 12:35 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Useful debate.
I think lowering the retry count to 1 makes sense. Maybe the interval down to 500 milliseconds. At least that is a 20 fold improvement.
But I take the point about 3rd party channels being different.
"Class of service channels" - not sure what they are - will look it up - but I assume that would mean using different xmit queues for some remote queues.
That is certainly an option where we have queues that we know send large numbers of messages (generally in a big batch). _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
exerk |
Posted: Fri Jan 31, 2014 1:13 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
zpat wrote: |
"Class of service channels" - not sure what they are - will look it up - but I assume that would mean using different xmit queues for some remote queues.
That is certainly an option where we have queues that we know send large numbers of messages (generally in a big batch). |
Your assumption is correct. Where I am permitted to do so, I generally separate messages by 'class', e.g. large persistent, small persistent, so that I can tweak channel attributes to best suit the optimum possible. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
|