ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Cluster QM "failover"

Post new topic  Reply to topic
 Cluster QM "failover" « View previous topic :: View next topic » 
Author Message
awatson72
PostPosted: Wed Sep 22, 2004 10:07 am    Post subject: Cluster QM "failover" Reply with quote

Acolyte

Joined: 14 Apr 2004
Posts: 69
Location: Freeport, Maine

We are attempting to analyze the behaviour of clusters before adding one to our existing MQ infrastructure.

For our analysis, we have set up a cluster of 3 QMs, two of which have a local queue defined as a cluster queue, each with the same name. These two QMs are also full repository owners. We are putting messages into this clustered queue from an application running on the same host as the third QM and testing different scenarios. Most results are as expected. However, when we bring one of the two "destination" QMs down, and then put NEW messages in the cluster queue using Bind on Open, we find that the new messages are backing up in the transmit queue (trying to get to the QM that is down). We expected to see that these messages would go to the other "destination" QM that is still running. My question - is there a problem with our setup or is this normal cluster behaviour? We realize that clusters don't provide HA, but also have read that clustering is supposed to "Facilitate continuous operations". There seems to be a good deal of confusion of what cluster offers in this regard, any insight appreciated!
Back to top
View user's profile Send private message
siliconfish
PostPosted: Wed Sep 22, 2004 10:32 am    Post subject: Reply with quote

Master

Joined: 12 Aug 2002
Posts: 203
Location: USA

What would have been happening in ur case is the Cluster Workload Balancing process has selected the queue from the failed queue manager and as you have user BIND on OPEN , it is putting the meessages to that queue from then on and as that queue manager is down its backingup.

You must not use BIND on OPEN, just leave it as NOTFIXED and the messages will be automatically failed over to the queue available queue manager once the balanncing process realises that the other queue manager is not available.
Back to top
View user's profile Send private message
offshore
PostPosted: Wed Sep 22, 2004 11:33 am    Post subject: Reply with quote

Master

Joined: 20 Jun 2002
Posts: 222

awatson,


You wrote:
Quote:
...QMs down, and then put NEW messages in the cluster queue using Bind on Open, we find that the new messages are backing...


I agree 100%, with siliconfish that for the failover to occur, default_bind has to be not fixed. The quote is a little vague and left me w/ some questions.

Is the app specifying MQOO_BIND_ON_OPEN or is it using MQOO_BIND_AS_Q_DEF, and using the cluster queue attribute?

Perhaps that application needs to have multiple messages processed in a certain order?

Just a twist (something to ponder) question.
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Wed Sep 22, 2004 3:37 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7716

No, something else is wrong. You can use BIND_ON_OPEN, even in this scenario.

Consider an CLUSTER with 100 QMs. 99 of them have Queue1. An app conects to QM100, and puts the messages with BIND_ON_OPEN. Lets assume the reasons for using BIND_ON_OPEN are valid. Obviously if one of those 99 QMS comes down, the app on QM100 should not fear that the clustering algorithim will choose Queue1 on the the QM that is down. The clustering algorithim considers that QMX is not available, and thus its Queue1 is not avaialable, so it will not choose it.

awatson72, as you describe it, something is not working properly. Your assumptions on how it should work are correct.

When you stop QM1, are you restarting the app on QM3, so that it has a chance to open Queue1, and the algorithim has a chance to see that QM1 is down? Maybe you are opening the queue with bind on open when both qms are up, the algorothim picks QM1 by chance and binds to it, and then you bring QM1 down. If that's your test, its working as designed. You told it to bind to QM1, and all the messages will go there (eventually).

Remeber, the BIND is established on the MQOPEN, not the MQPUT.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
siliconfish
PostPosted: Wed Sep 22, 2004 3:52 pm    Post subject: Reply with quote

Master

Joined: 12 Aug 2002
Posts: 203
Location: USA

Peter - Thanks for the correction
Back to top
View user's profile Send private message
offshore
PostPosted: Thu Sep 23, 2004 2:40 am    Post subject: Reply with quote

Master

Joined: 20 Jun 2002
Posts: 222

I guess that is a good point Peter.

When I first started working with clustering and using BIND_ON_OPEN there was a period of time during the fail over messages would back up on the xmitq. Depending on the load it there would be 3-15 messages that were put thinking the q was still available. But eventually, they would start going to the available queue manager. It wasn't until BIND_NOTFIXED that this problem went away.

So my bad...guess I should read more carefully (or at least think some more before posting)
Back to top
View user's profile Send private message Send e-mail
awatson72
PostPosted: Thu Sep 23, 2004 9:53 am    Post subject: Reply with quote

Acolyte

Joined: 14 Apr 2004
Posts: 69
Location: Freeport, Maine

Thanks for the feedback on this.

A few more details found by further testing: A job puts 10 messages into the cluster queue in a "block", with one OPEN and one CLOSE. For our test with one cluster QM down, we ran the job three times and found that 20 messages were delivered to the live QM, and 10 were stuck in the XMIT queue. Subsequent messages go to the live QM. We now assume (someone can probably confirm this), that in order for the cluster to know that there is a dead QM, it must first attempt to OPEN and PUT/GET to the dead QM. Incidentally, this message, (or block of messages in our case), ends up stranded in the XMIT queue in order to accomplish this. That unfortunate message will stay there until the QM comes back up, or manual intervention occurs.
Does this sound accurate, or are we still missing something?
Thanks...
_________________
Andrew Watson
L.L. Bean, Inc.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Sep 23, 2004 9:59 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7716

This is definitly buggy behaviour. What version of MQ are you on? Lots of issues have been fixed once 5.3 came out. CSDs for 5.3 have resolved some more issues as well.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JasonE
PostPosted: Fri Sep 24, 2004 12:15 am    Post subject: Reply with quote

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

It certainly doesnt sound right - I'd agree with the interesting info being the platform, version, release and fixpack. Apars IC36185 (5.3 fp5) and one other (which I cant find but I think shipped in fp6!) spring to mind
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » Clustering » Cluster QM "failover"
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.