MQSeries.net :: View topic - failed qmgr, cluster transmission queue, suspend qmgr

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » failed qmgr, cluster transmission queue, suspend qmgr

failed qmgr, cluster transmission queue, suspend qmgr

« View previous topic :: View next topic »

Author

Message

credito

Posted: Fri Nov 28, 2008 1:09 am Post subject: failed qmgr, cluster transmission queue, suspend qmgr

Newbie

Joined: 23 Oct 2008
Posts: 9

Hi everybody,

I have a few questions related to clusters and queue managers in a failure event.

I do not talk about HA. I just want to know what happens and what can I do if the scenario happens.

Lets assume I have a small cluster There is a sending qmgr (SEND) , one primary receiving qmgr (PRIM) and one standby receiving qmgr (STBY). The two receiving queue managers have the same local queues. In normal operation all messages from SEND should go to PRIM and not to STBY. How to do that best ? Suspend STBY ?, shut down STBY, inhibit put on all queues on STBY ?

Let's assume PRIM fails (hardware crash, ...) and cannot be started in a couple of hours. I know all message on PRIM are stucked, which is okay. Now I would like to arrange that all messages go to STBY asap. How can I do this ? How long does it take, until mq realizes PRIM is down? Can I suspend PRIM from a full repository queue manager ? Is it the same command as on PRIM locally? What happens with messages which are on the cluster transmission queue on SEND ? Does MQ reroute them to STBY? Can this be done manually ? Do you have some commands which I can use in the scenario ?

For the application it is important that message are sent only once and do not have much delay, 5 minutes or so are fine.

Hope you can help me a bit .

Cheers,

Marcel

Vitor

Posted: Fri Nov 28, 2008 1:39 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

You don't say what version of WMQ you're using but if you're using v6 (and you should be for a variety of reasons aside from the cluster improvements) you can achieve the effect you're looking for by setting the priority of the channels through the NETPRTY parameter. If you set the priority of the channel to PRIM to 9 and to STBY to 0 then all the messages will go to PRIM except under the most exceptional of circumstances.

If PRIM goes down then you'll get some stuck messages (as you've correctly indicated) until the cluster tags the queue manager as unresponsive when the traffic will switch to the other available channel.

There have been posts about this before which the search function will find for you - I remember fjb_saper did an explaination of NETPRTY that's much better than mine! There's also been a lot of discussion on stuck messages, how long it takes a cluster to notice a downed queue manager, that sort of thing.

The Search Function Is Your Friend. So Is The Clusters Manual.

Happy Reading!

_________________
Honesty is the best policy.
Insanity is the best defence.

credito

Posted: Fri Nov 28, 2008 2:17 am Post subject:

Newbie

Joined: 23 Oct 2008
Posts: 9

Hi Vitor,

thank you for for reply. I'm using v6 .

The NETPRTY hint is what I was looking for. Seems really good. Now I just have to figure out how to reroute the messages in the cluster transmission queue.

I try the search function. If anyone has an answer or a link to a post I would be very happy for sharing.

Cheers,

Marcel

Vitor

Posted: Fri Nov 28, 2008 2:23 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

credito wrote:

Now I just have to figure out how to reroute the messages in the cluster transmission queue.

You can't - they're stuck (as previously discussed).

It's theoretically possible to reroute them by reading them off, altering the xmit header and re-adding them. It's also theoretically possible to make a car do 300 mph by strapping a rocket to the roof. Both techniques might work but are inherently dangerous and very, very messy when they go wrong. It will be faster and safer to restart the downed queue manager, at which point the messages sort themselves out.

Accept that anything in the transmission queue isn't going anywhere until the downed queue manager is back. If that takes an unacceptable period of time, buy some HA software.
_________________
Honesty is the best policy.
Insanity is the best defence.

Last edited by Vitor on Fri Nov 28, 2008 2:24 am; edited 1 time in total

exerk

Posted: Fri Nov 28, 2008 2:24 am Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

credito wrote:

...Now I just have to figure out how to reroute the messages in the cluster transmission queue...

Vitor wrote:

...If PRIM goes down then you'll get some stuck messages (as you've correctly indicated) until the cluster tags the queue manager as unresponsive when the traffic will switch to the other available channel...

_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

PeterPotkay

Posted: Fri Nov 28, 2008 6:28 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Any messages in the S.C.T.Q. that were originally going to PRIM will go to STBY the next time the cluster workload algorithm has a look at them. They will only stay stuck in the S.C.T.Q. if PRIM has the only queue in the cluster, or if the sending application sent them with the BIND_ON_OPEN option or if the app specifically addressed them to go to the PRIM QM.

You should probably check out the CLWLPRTY attribute.
http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.csqzah.doc/qc13030_.htm
Its more appropriate for what you are attempting.

Looking at the The cluster workload management algorithm
http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.csqzah.doc/qc10940_.htm
you'll notice both CLWLPRTY and NETPRTY are only looked at AFTER the QM looks at the status of the channel. Meaning if the channel between SEND and PRIM is anything other than INACTIVE or RUNNING, guess what, it starts sending to STBY. The SEND to PRIM channel can be retrying for just a few seconds due to a network blip. Do you want "failover" in that case? If the SEND to PRIM channel is inactive and starts up, it goes through several other channel statuses before it gets to running (binding, initilaizing, etc) Any of those will be seen as a reason to start sending to STBY.

Unless you can handle some messages going to STBY outside of a real failover situation, I would not rely on channel attributes to control failover for you.

You're best bet would be to uncluster all the queues on STBY until you need them, then run the script to cluster them in the "failover" situation. Or you can Put Inhibit / Enable them. Suspending the QM only advises SEND not to send anything to STBY; if there is no other destination for a message (no matter how brief) it will go to STBY even if its suspended. All 3 methods cannot prevent SEND from sending to STBY if an app specifically addresses the messages to STBY. The only way to do that is to stop the CLUSRCVR on STBY, but then you'll likely get cluster administration messages for STBY stacking up in SEND's S.C.T.Q.
_________________
Peter Potkay
Keep Calm and MQ On

Vitor

Posted: Fri Nov 28, 2008 6:54 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

PeterPotkay wrote:

Any messages in the S.C.T.Q. that were originally going to PRIM will go to STBY the next time the cluster workload algorithm has a look at them.

Really? What would cause/trigger them to be readdressed like this once they've already been put to the S.C.T.Q?

Aside from that, I take your points though I'd be interested in your reasoning on CLWLPRTY over NETPRTY. I certainly agree it's a poor man's failover and will hiccup in the situations you mention.

Depends on how desperate the failover is, message volumes, message frequency, etc, etc, etc.
_________________
Honesty is the best policy.
Insanity is the best defence.

exerk

Posted: Fri Nov 28, 2008 7:12 am Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Vitor wrote:

Really? What would cause/trigger them to be readdressed like this once they've already been put to the S.C.T.Q?

I was always under the impression that a queue manager would 'readdress' messages for an unavailable queue manager, to an available queue manager unless the messages had been specifically addressed to a particular queue manager - which is what I thought you were alluding to, hence why I highlighted the section of your post. I certainly believe (maybe erroneously) that the internal algorithm of a queue manager allows for this.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

PeterPotkay

Posted: Fri Nov 28, 2008 7:54 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Vitor wrote:

PeterPotkay wrote:

Any messages in the S.C.T.Q. that were originally going to PRIM will go to STBY the next time the cluster workload algorithm has a look at them.

Really? What would cause/trigger them to be readdressed like this once they've already been put to the S.C.T.Q?

I'm pretty sure the next time the channel that would have taken them retries is when the messages become eligible for rerouting. While I'm certain this rerouting does happen, I'm not 100% sure on the details of how. Its in my notes at work. Hopefully a Hursleyite will happen by and enlighten us.

Vitor wrote:

Aside from that, I take your points though I'd be interested in your reasoning on CLWLPRTY over NETPRTY. I certainly agree it's a poor man's failover and will hiccup in the situations you mention.

Read the description for both. Our scenario is not 2 networks to the same QM, its the same network to two different QMs.
_________________
Peter Potkay
Keep Calm and MQ On

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Clustering » failed qmgr, cluster transmission queue, suspend qmgr

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP