|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Cluster QM "failover" |
« View previous topic :: View next topic » |
Author |
Message
|
awatson72 |
Posted: Wed Sep 22, 2004 10:07 am Post subject: Cluster QM "failover" |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
We are attempting to analyze the behaviour of clusters before adding one to our existing MQ infrastructure.
For our analysis, we have set up a cluster of 3 QMs, two of which have a local queue defined as a cluster queue, each with the same name. These two QMs are also full repository owners. We are putting messages into this clustered queue from an application running on the same host as the third QM and testing different scenarios. Most results are as expected. However, when we bring one of the two "destination" QMs down, and then put NEW messages in the cluster queue using Bind on Open, we find that the new messages are backing up in the transmit queue (trying to get to the QM that is down). We expected to see that these messages would go to the other "destination" QM that is still running. My question - is there a problem with our setup or is this normal cluster behaviour? We realize that clusters don't provide HA, but also have read that clustering is supposed to "Facilitate continuous operations". There seems to be a good deal of confusion of what cluster offers in this regard, any insight appreciated! |
|
Back to top |
|
 |
siliconfish |
Posted: Wed Sep 22, 2004 10:32 am Post subject: |
|
|
 Master
Joined: 12 Aug 2002 Posts: 203 Location: USA
|
What would have been happening in ur case is the Cluster Workload Balancing process has selected the queue from the failed queue manager and as you have user BIND on OPEN , it is putting the meessages to that queue from then on and as that queue manager is down its backingup.
You must not use BIND on OPEN, just leave it as NOTFIXED and the messages will be automatically failed over to the queue available queue manager once the balanncing process realises that the other queue manager is not available. |
|
Back to top |
|
 |
offshore |
Posted: Wed Sep 22, 2004 11:33 am Post subject: |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
awatson,
You wrote:
Quote: |
...QMs down, and then put NEW messages in the cluster queue using Bind on Open, we find that the new messages are backing... |
I agree 100%, with siliconfish that for the failover to occur, default_bind has to be not fixed. The quote is a little vague and left me w/ some questions.
Is the app specifying MQOO_BIND_ON_OPEN or is it using MQOO_BIND_AS_Q_DEF, and using the cluster queue attribute?
Perhaps that application needs to have multiple messages processed in a certain order?
Just a twist (something to ponder) question. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Sep 22, 2004 3:37 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
No, something else is wrong. You can use BIND_ON_OPEN, even in this scenario.
Consider an CLUSTER with 100 QMs. 99 of them have Queue1. An app conects to QM100, and puts the messages with BIND_ON_OPEN. Lets assume the reasons for using BIND_ON_OPEN are valid. Obviously if one of those 99 QMS comes down, the app on QM100 should not fear that the clustering algorithim will choose Queue1 on the the QM that is down. The clustering algorithim considers that QMX is not available, and thus its Queue1 is not avaialable, so it will not choose it.
awatson72, as you describe it, something is not working properly. Your assumptions on how it should work are correct.
When you stop QM1, are you restarting the app on QM3, so that it has a chance to open Queue1, and the algorithim has a chance to see that QM1 is down? Maybe you are opening the queue with bind on open when both qms are up, the algorothim picks QM1 by chance and binds to it, and then you bring QM1 down. If that's your test, its working as designed. You told it to bind to QM1, and all the messages will go there (eventually).
Remeber, the BIND is established on the MQOPEN, not the MQPUT. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
siliconfish |
Posted: Wed Sep 22, 2004 3:52 pm Post subject: |
|
|
 Master
Joined: 12 Aug 2002 Posts: 203 Location: USA
|
Peter - Thanks for the correction |
|
Back to top |
|
 |
offshore |
Posted: Thu Sep 23, 2004 2:40 am Post subject: |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
I guess that is a good point Peter.
When I first started working with clustering and using BIND_ON_OPEN there was a period of time during the fail over messages would back up on the xmitq. Depending on the load it there would be 3-15 messages that were put thinking the q was still available. But eventually, they would start going to the available queue manager. It wasn't until BIND_NOTFIXED that this problem went away.
So my bad...guess I should read more carefully (or at least think some more before posting) |
|
Back to top |
|
 |
awatson72 |
Posted: Thu Sep 23, 2004 9:53 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
Thanks for the feedback on this.
A few more details found by further testing: A job puts 10 messages into the cluster queue in a "block", with one OPEN and one CLOSE. For our test with one cluster QM down, we ran the job three times and found that 20 messages were delivered to the live QM, and 10 were stuck in the XMIT queue. Subsequent messages go to the live QM. We now assume (someone can probably confirm this), that in order for the cluster to know that there is a dead QM, it must first attempt to OPEN and PUT/GET to the dead QM. Incidentally, this message, (or block of messages in our case), ends up stranded in the XMIT queue in order to accomplish this. That unfortunate message will stay there until the QM comes back up, or manual intervention occurs.
Does this sound accurate, or are we still missing something?
Thanks... _________________ Andrew Watson
L.L. Bean, Inc. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Sep 23, 2004 9:59 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
This is definitly buggy behaviour. What version of MQ are you on? Lots of issues have been fixed once 5.3 came out. CSDs for 5.3 have resolved some more issues as well. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
JasonE |
Posted: Fri Sep 24, 2004 12:15 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
It certainly doesnt sound right - I'd agree with the interesting info being the platform, version, release and fixpack. Apars IC36185 (5.3 fp5) and one other (which I cant find but I think shipped in fp6!) spring to mind |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|