Author |
Message
|
svu |
Posted: Tue Jun 07, 2011 5:25 am Post subject: Cluster receiver priorities: glitch? |
|
|
Voyager
Joined: 30 Jan 2006 Posts: 99
|
Hi ppl
Another question about WMQ cluster wonders.
There are 2 QMs, with cluster receiver channel priorities 5 and 3, identical sets of clustered queues. The 2nd QM is considered as a hot spare.
As IBM declares, all messages should go to the higher priority QM (5). And we observed that for a while... but a couple of days ago we found 2 messages on the spare QM. How is that possible? We are 99.99% sure there was no downtime or network issues. No errors in /var/mqm/errors/* Is there any way to answer that question retrospectively? And - is there any way to monitor the cluster configuration - to catch the moment when someone in the cluster tries to use spare QM?
Overall, is the solution based on the priorities considered as reliable?
Thanks for any ideas |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 07, 2011 5:49 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
svu wrote: |
There are 2 QMs, with cluster receiver channel priorities 5 and 3 |
Why not 9 and 1?
svu wrote: |
As IBM declares, all messages should go to the higher priority QM (5) |
They should, but where does IBM guarantee that they will? Especially with the priorities so close?
svu wrote: |
Is there any way to answer that question retrospectively? |
I'd doubt it. Others may have more ingenuity.
svu wrote: |
And - is there any way to monitor the cluster configuration - to catch the moment when someone in the cluster tries to use spare QM? |
It's not someone in the cluster, it's the cluster itself (unless someone is deliberately addressing messages to the "spare" queue manager) and you might be able to manage something with channel stats. What would you do then?
svu wrote: |
Overall, is the solution based on the priorities considered as reliable? |
As I've said, the priorities are typically set more agressively for what you're doing. And it's reliable in that you're getting what you pay for - if you want more reliability, buy HA software with hot standby capabilities.
There will be a posting on the virtues of active/active any time now, assuming it's not already happened. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
skoobee |
Posted: Tue Jun 07, 2011 5:54 am Post subject: |
|
|
Acolyte
Joined: 26 Nov 2010 Posts: 52
|
The most likely explanation is that the higher priority channel stopped and restarted, and while it was doing so the relative channel states (see the workload balancing algorithm) of the two channels was temporarily switched until the higher priority channel was RUNNING again.
Last edited by skoobee on Tue Jun 07, 2011 5:58 am; edited 1 time in total |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jun 07, 2011 5:56 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
skoobee wrote: |
The most likely explanation is that the higher priority channel stopped and restarted, and while it was doing so the relative channel states (see the workload balancing algorithm) of the two channels was temporarily switched until the higher priority channel was RUNNING again. |
I'd consider the most likely cause to be an application that issued BIND_ON_OPEN and chose the wrong qmgr, temporarily, until it was restarted.
Or some other developer-centric error rather than an infrastructure-centric error. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 07, 2011 6:04 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
I'd consider the most likely cause to be an application that issued BIND_ON_OPEN and chose the wrong qmgr, temporarily, until it was restarted. |
Which for the record is what I was talking about when I talked about messages being addressed.
And this is one of the impacts to the "reliability" of this solution - it works fine provided all the applications are well behaved. To pre-empt your next question no you can't enforce the open options adminstratively, they're under the control of the application so you can only enforce them with standards. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
svu |
Posted: Tue Jun 07, 2011 6:21 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
Voyager
Joined: 30 Jan 2006 Posts: 99
|
Vitor wrote: |
Why not 9 and 1? |
Does that really matter? I do not see how (just double-checked the algorithm).
Vitor wrote: |
They should, but where does IBM guarantee that they will? Especially with the priorities so close? |
Well, if the cluster algorithm is guaranteed to work as in http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/index.jsp?topic=/com.ibm.mq.csqzah.doc/qc10940_.htm it would mean "yes"
Vitor wrote: |
It's not someone in the cluster, it's the cluster itself (unless someone is deliberately addressing messages to the "spare" queue manager) and you might be able to manage something with channel stats. What would you do then? |
The ultimate goal is to find the event that triggers the thing. The channel stats... well, I'll have a look.
Vitor wrote: |
As I've said, the priorities are typically set more agressively for what you're doing. And it's reliable in that you're getting what you pay for - if you want more reliability, buy HA software with hot standby capabilities. |
Is WMQ cluster not considered as HA???
Last edited by svu on Tue Jun 07, 2011 6:27 am; edited 1 time in total |
|
Back to top |
|
 |
svu |
Posted: Tue Jun 07, 2011 6:23 am Post subject: |
|
|
Voyager
Joined: 30 Jan 2006 Posts: 99
|
skoobee wrote: |
The most likely explanation is that the higher priority channel stopped and restarted, and while it was doing so the relative channel states (see the workload balancing algorithm) of the two channels was temporarily switched until the higher priority channel was RUNNING again. |
There would hardly be anything STOPPED. It could get into INACTIVE state - but according to the algorithm, it is treated the same way as RUNNING, right? |
|
Back to top |
|
 |
svu |
Posted: Tue Jun 07, 2011 6:25 am Post subject: |
|
|
Voyager
Joined: 30 Jan 2006 Posts: 99
|
mqjeff wrote: |
Or some other developer-centric error rather than an infrastructure-centric error. |
Well, I cannot totally eliminate that possibility - but in our case it is highly unlikely... |
|
Back to top |
|
 |
WMBDEV1 |
Posted: Tue Jun 07, 2011 6:27 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
Sentinel
Joined: 05 Mar 2009 Posts: 888 Location: UK
|
svu wrote: |
Is WMQ cluster not considered as HA??? |
No. This has been discussed many times. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 07, 2011 6:32 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
svu wrote: |
Is WMQ cluster not considered as HA??? |
No. It's a workload balancer.
Many people have posted in here with problems (issues) like this they've experienced trying to use it like this. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 07, 2011 6:34 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
svu wrote: |
Vitor wrote: |
Why not 9 and 1? |
Does that really matter? I do not see how (just double-checked the algorithm). |
Check some of the posts on this subject. Cluster workload balancing is an art rather than a science, and tends not to be black and white. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
svu |
Posted: Tue Jun 07, 2011 6:36 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
Voyager
Joined: 30 Jan 2006 Posts: 99
|
Vitor wrote: |
svu wrote: |
Is WMQ cluster not considered as HA??? |
No. It's a workload balancer. |
Err, according to official IBM trainer, the workload balance is managed using the channel weight parameter, not priority. The trainer told me that the simple DR solution was to use the channel priorities - so that the messages do not go to the lower priority channel if the higher priority channel is available in the cluster. And these words are repeated in the section #15 of the cluster workload mgmt algorighm. |
|
Back to top |
|
 |
svu |
Posted: Tue Jun 07, 2011 6:42 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
Voyager
Joined: 30 Jan 2006 Posts: 99
|
Vitor wrote: |
Check some of the posts on this subject. Cluster workload balancing is an art rather than a science, and tends not to be black and white. |
Ghm, sounds really depressing. What posts could you recommend? Anything official from IBM? I did search on "cluster receiver priorities" here - nothing interesting.
Actually, the best thing would be to find something from IBM where they say "sometimes that algorithm does not work, under the following circumstances:..." Am I asking too much? |
|
Back to top |
|
 |
WMBDEV1 |
Posted: Tue Jun 07, 2011 6:46 am Post subject: |
|
|
Sentinel
Joined: 05 Mar 2009 Posts: 888 Location: UK
|
So.... given a scenario where you have a number of critical messages on a QM (in your cluster) waiting to be processed.
You are presented with a technical failure and the box hosting the QM (in your cluster) becomes unavailable for a long period of time.
What happens to the messages that were on that QM (not new ones which will be routed to the other QM in the cluster)? When would they be able to be processed? Is that HA? |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 07, 2011 6:52 am Post subject: Re: Cluster receiver priorities: glitch? |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
svu wrote: |
Err, according to official IBM trainer, the workload balance is managed using the channel weight parameter, not priority. The trainer told me that the simple DR solution was to use the channel priorities - so that the messages do not go to the lower priority channel if the higher priority channel is available in the cluster. And these words are repeated in the section #15 of the cluster workload mgmt algorighm. |
And both of those statements are the simplistic version. I offer into evidence the number of discussions in this forum surrounding the parameters that influence message distribution and their effect on each other. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|