Author |
Message
|
Zappa |
Posted: Tue Oct 19, 2010 7:30 am Post subject: |
|
|
 Acolyte
Joined: 06 Oct 2005 Posts: 55 Location: UK
|
I’ve been avoiding the HA debate as much as possible and don’t want to spark off another one, we do use HACMP for DB’s and filesystems but the problem we would have in using something like IC91 is where to fail it over to i.e. the NOT SO cheap licensing of a server that does nothing 24/7… I hear them asking “How much? For doing what?”
When and if a server dies then a clustered delivery of data should then route it to whatever is available as in active/active clustering not just half of it – surely!
Hot topic I know… |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Oct 19, 2010 7:52 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Not so much a hot-topic; but rather, the general misunderstanding of what WMQ clusters offer, and what they do not.
Your OP identifies messages stranded in the SCTQ due to the downstream qmgr (or channel) failing.
One of the other replies to your OP brought up the issue of messages that successfully arriveng at a cluster destination queue, AND the destination qmgr that hosts the cluster queue failing shortly after the message arrives, but before it can be consumed. In this instance, the message is stranded, too; but this time on the failed qmgr. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 19, 2010 7:54 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Zappa wrote: |
we do use HACMP for DB’s and filesystems but the problem we would have in using something like IC91 is where to fail it over to i.e. the NOT SO cheap licensing of a server that does nothing 24/7… |
HACMP supports Active/Active and "where" is answered by "where do you fail the DBs to?". I've used HACMP to support WMQ & WMB in exactly the configuration you're talking about, with the "failover" server doing work when there's no emergency.
Zappa wrote: |
When and if a server dies then a clustered delivery of data should then route it to whatever is available as in active/active clustering not just half of it – surely! |
No, and don't call me Shirley.
There is a world of difference between an HACMP cluster and a WMQ cluster; it's one of the great problems that the word "cluster" is used for so many different architectures in IT. At it's simplest, a WMQ cluster is designed to believe a dropped communication link is a transient problem that should be dealt with by retrying a few times. An HACMP cluster believes a loss of communication is a reason to bring up an alternate instance. To tie back to your question, it's 2 different interpretations of the word "available", both valid in their context.
If I was in your position, and had HACMP on site (i.e. already licensed), and had existing DBs and so forth under HACMP (presumably running Active/Active so there's no "wasted" DB server) that I could slot into I'd put the queue managers under HACMP & solve all my problems.
But I'm not in your position. You are.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
zonko |
Posted: Tue Oct 19, 2010 9:25 am Post subject: |
|
|
Voyager
Joined: 04 Nov 2009 Posts: 78
|
When a cluster channel stops RUNNING, and goes to RETRYING in this case, any msgs remaining on the cluster xmitq for that channel are read from the queue and put back through the cluster workload balancing mechanism. If there are alternative destinations available in the cluster, for example on cluster qmgrs served by a RUNNING channel rather than a RETRYING one, the msgs are sent to that qmgr. The msgs will only remain on the cluster xmitq if there is no alternative destination, as will be the case for msgs destined for a queue which only has a single instance in the cluster, or if the qmgr was specified when the msg was originally put, or if the msg was put BIND_ON_OPEN.
Obviously, msgs which have already been sent such that the channel is indoubt will not be reallocated. The chance of this happening can be minimised by setting the BATCHHB attribute. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Oct 19, 2010 9:34 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Yes, but ...
From the WMQ v7 Clusters manual:
If a local queue within the cluster becomes unavailable while a message is in
transit, the message is forwarded to another instance of the queue (but only if the queue was opened (MQOPEN) with the BIND_NOT_FIXED open option).
[edit]
BIND option can be specified by the application at MQOPEN and/or set at the queue.
From the MQSC manual:
DEFBIND
Specifies the binding to be used when the application specifies
MQOO_BIND_AS_Q_DEF on the MQOPEN call, and the queue is a
cluster queue.
OPEN The queue handle is bound to a specific instance of the cluster
queue when the queue is opened.
NOTFIXED
The queue handle is not bound to any particular instance of the
cluster queue. This allows the queue manager to select a specific
queue instance when the message is put using MQPUT, and to
change that selection subsequently should the need arise.
The MQPUT1 call always behaves as if NOTFIXED had been specified. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 19, 2010 10:30 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
zonko wrote: |
or if the qmgr was specified when the msg was originally put, or if the msg was put BIND_ON_OPEN. |
This all hangs on the OP's assertion that workload balancing happens correctly for all messages when all the queue managers are up. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Oct 19, 2010 12:43 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Zappa wrote: |
The values are this low because I don’t want too many MSGS stuck on the SCTQ as there are other QMGRS in the cluster sharing the same queue names and I want these to pick up the load when one isn’t available. – If there are other ways of doing this then please let me know. |
This is not needed to accomplish your goal, and is hurting you in other ways.
The cluster will reroute the messages if they are not bound to that particular QM even if the channel is just retrying. No need to force it to go into a STOPPED status with very low retry values.
But because you have such low retry values, the channel will go into STOPPED rather quickly and require manual intervention on your part when the potentially brief outage is over instead of auto recovering on its own. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Oct 19, 2010 12:47 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Please display the attributes of the clustered queue definition. Use DISPLAY QL( ). Then post the results here.
Does the application specify MQOO_BIND_FIXED?
Or _NOT_FIXED?
Or _AS_Q_DEF? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
Zappa |
Posted: Wed Oct 20, 2010 1:41 am Post subject: |
|
|
 Acolyte
Joined: 06 Oct 2005 Posts: 55 Location: UK
|
The queues are DEFBIND(NOTFIXED) and the QMGR is rarely specified on puts.
Unfortunately my predicament is inherited, had this been a greenfield site I’d certainly want HACMP for WMQ/WMB as you all rightly encourage. Our WMB servers do have HACMP and these were once configured to fail over to each other but quite simply the one server does not have the capacity for both brokers, whereas one on its own can just about handle the total load. All HACMP is being used for currently is the non active/active components which is less than 2% of the total volume, everything else is load balanced with WMQ clustering.
The application DB is HACMP’d to an idle standby, I had suggested that the brokers could fail over to this also but my UNIX SA responsible for HACMP didn’t like the sound of this too much and discouraged it. I would also have a hard time explaining that we’d need to license many more cpu’s worth of WMQ/WMB for an idle standby, the budget would be blown (plus no bonus for me in the foreseeable future).
At the risk of being ridiculed I’m irresponsibly thinking “with a small config change to our channels I can make this HA’ish (if that’s a word)” and this might be welcomed in these hard times.
I do currently need to make the round peg fit the square hole so in my case stopping the channels quickly when one QMGR is downed prevents thousands of stranded messages.
Any advice is very much welcomed to aid my challenges… |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 20, 2010 4:14 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Zappa wrote: |
At the risk of being ridiculed I’m irresponsibly thinking “with a small config change to our channels I can make this HA’ish (if that’s a word)” and this might be welcomed in these hard times. |
As if I'd ridicule anybody.
I think you've articulated the key problem - it's HA'ish. Not HA. You might be able to make this cheap HA solution work, but the results will not be predictable. Unless you really enjoy restarting channels manually, there's still going to be a window where messages will get stuck before the channel finally stops. What happens if one of those messages is business critical? A large, important or time sensitive message? Those same people who welcomed your cost savings in these hard times will be baying for your blood.
Zappa wrote: |
I do currently need to make the round peg fit the square hole so in my case stopping the channels quickly when one QMGR is downed prevents thousands of stranded messages. |
What you really need is to get management buy-in to the fact that you're pushing a round peg into a square hole, and the solution has weaknesses. Specifically you can prevent thousands of stranded messages but you can't prevent (or guarantee to prevent) stranded messages. Nor can you accurately predict how many or which messages will be stranded. Those who own the content must understand that.
They must also understand that if the queue manager is downed for a significant period (hardwware failure rather than comms failure) those messages will sit for the duration. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 20, 2010 4:16 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Zappa wrote: |
The queues are DEFBIND(NOTFIXED) and the QMGR is rarely specified on puts. |
Before someone else says it:
DEFBIND(NOTFIXED) is only a default. An application can specify BIND_ON_OPEN if it choses in the same way it can specify a queue manager.
If the queue manager is raely specified, that means it's sometimes specified. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Oct 20, 2010 4:35 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Zappa wrote: |
I do currently need to make the round peg fit the square hole so in my case stopping the channels quickly when one QMGR is downed prevents thousands of stranded messages.
Any advice is very much welcomed to aid my challenges… |
See my previous post. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Zappa |
Posted: Wed Oct 20, 2010 5:19 am Post subject: |
|
|
 Acolyte
Joined: 06 Oct 2005 Posts: 55 Location: UK
|
Hope this doesn’t come across in the wrong way but you are preaching to the converted. I know what I’m proposing isn’t best practise and I do need management buy in to bolster resources for a proper HA solution, hence me trying to avoid the topic. Like I say I've inherited this config!
I’m not overly sure why you are saying that I’d have to randomly start the stopped channels though, would they not only be stopped if the values were too low? The values I chose were just a test and none of this is in production yet. If the values were high enough to cope with network blips but short enough not to isolate too many msgs then I can't help but think that it’s better than what I have now. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 20, 2010 5:25 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Zappa wrote: |
If the values were high enough to cope with network blips but short enough not to isolate too many msgs then I can't help but think that it’s better than what I have now. |
Well if you can hit that happy medium then well done. I suspect that any values low enough to isolate "not too many msgs" (however you determine that) will need to be so low that the channels will stop more often than you'd like. And at inconvienient times.
I repeat (because I feel it's important) that you need to explain this to the great and the good. They need to understand this, and will potentially have input to that "not too many" number. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Oct 20, 2010 6:27 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
If you want to set the value that tells the QM how long to wait between retries to a smaller value, that's fine.
But don't set your Long Retry Count to a small number. Once that is exhausted, the channel hard stops and then you need to manually start it. There's no need for that. Let it retry for a long time over and over. Let it recover on its own when the underlying problem goes away.
You probably do not want or need the channel to be hard stopped. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
|