MQSeries.net :: View topic - CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue

CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue

« View previous topic :: View next topic »

Author

Message

In_love_with_MQ

Posted: Fri Nov 03, 2006 5:44 am Post subject: CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue

Acolyte

Joined: 10 Jul 2005
Posts: 70

Hi ,

We have 2 P590 which has 2 instances of cluster Queue into which messages comes from many Qm managers in the cluster .

On one (A) of them a queue called X is defined with CLWLPRTY AS 9
On another (B)a queue called X is defined with CLWLPRTY AS 8

So all messages flows via A(No issues) . Now when i shutdown A as per design the messages should start flowing immediately via B .

But No , it takes some time for this and messages starts flowing after a while only due to this lots of messages are stuck in the XMITQ of the senders as they have pointings to A .

So why is failover not hapenning instantly ???

Is there a Limitation .......If so then this is a Bug ..Or else am i doing anything wrong

Vitor

Posted: Fri Nov 03, 2006 6:04 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

<Standard Rant>
I'm not happy about MQ clustering being used for failover; it's primarially for workload balancing and I think high availablility clustering should be done with dedicated software.
</Standard Rant>

Having said that, I suspect the problem is the workload balancing algortham doesn't immediately detect that A has stopped processing messages. When you say "shutdown" do you mean "endmqm" or do you mean "pulled cable out of box"?

In any event I suspect the cluster heatbeat interval (or similar mechanism, I'm guessing at this point) needs to detect that A is unresponsive; the Clusters manual says:

Quote:

WebSphere MQ obtains the priority of queue managers after checking channel status. This means that only accessible queue managers are available for selection, and it allows WebSphere MQ to prioritize, where multiple destinations are available.

I might be inclined to try setting CLWLPRTY on the channel rather than the queue. Or doing failover with dedicated software.

Note: I accept unreservedly that the Clusters manual for v6.0 says these parameters can be used to provide failover (and they even put it in italics). My assertion is that it's not designed for that, the attributes are new for v6 and it's not especially good at it (as this poster seems to have found).

I mean, my car's manual says it can carry 60Kg in the boot. Doesn't say how fast it goes with that much load, and doesn't mean I won't hire a van if I ever need to carry something that weighs that much.
_________________
Honesty is the best policy.
Insanity is the best defence.

Gaya3

Posted: Fri Nov 03, 2006 6:05 am Post subject:

Jedi

Joined: 12 Sep 2006
Posts: 2493
Location: Boston, US

Hi

When ever the first queue manager (here its A) downs, automatically the Second queuemanager will starts off as per the fail over condition

I suspect it is trying to connect with A for some interval of time
than later it came to know that it is down. after that the fail over condition is working with QM B.

So if you have set some heart beat interval, disconnect interval of the channel

Thanks and Regards
Gaythri
_________________
Regards
Gayathri
-----------------------------------------------
Do Something Before you Die

Vitor

Posted: Fri Nov 03, 2006 6:22 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Gaya3 wrote:

So if you have set some heart beat interval, disconnect interval of the channel

Isn't that what I said......?

_________________
Honesty is the best policy.
Insanity is the best defence.

Gaya3

Posted: Fri Nov 03, 2006 6:25 am Post subject:

Jedi

Joined: 12 Sep 2006
Posts: 2493
Location: Boston, US

Hi Vitor

Yes exactly, we both worked on this issue at the same time i believe

Gayathri
_________________
Regards
Gayathri
-----------------------------------------------
Do Something Before you Die

happyj

Posted: Fri Nov 03, 2006 6:31 am Post subject:

Voyager

Joined: 07 Feb 2005
Posts: 87

If I understand this right, the messages will still go to A until the sending queue manager knows that the channel is not active. They will then go to B. The channel status can be determined by using the BATCHHB on the channel.

Vitor

Posted: Fri Nov 03, 2006 6:47 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

That's certainly what I suspect to be happening. Now to avoid this you need to detect change in channel status which is why I suggested using the channel priority; if the xmitq to A is filling up the workload balancer might be inclined to start routing to B. It's a question for better minds than mine how quickly it will do that & how configurable it is.

To get instant failover I fear you'd need the heartbeat interval set so high message throughput would suffer (note this is a fear not a fact). Another way round this would be to write a workload balancing exit that (by fair means or foul) determined the loss of a queue manager and rerouted traffic. Not a trivial task, and opens up a raft of maintainability / support / efficientcy issues.

Better to buy high availability software!

<StandardRant/>
_________________
Honesty is the best policy.
Insanity is the best defence.

Vitor

Posted: Fri Nov 03, 2006 6:49 am Post subject: Re: CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

In_love_with_MQ wrote:

But No , it takes some time for this and messages starts flowing after a while only due to this lots of messages are stuck in the XMITQ of the senders as they have pointings to A .

Am I reading this correctly and no messages are getting stuck and unprocessed on A? Everything ends up on B, albeit after a delay?
_________________
Honesty is the best policy.
Insanity is the best defence.

In_love_with_MQ

Posted: Fri Nov 03, 2006 7:04 am Post subject: Veritas is costly affair

Acolyte

Joined: 10 Jul 2005
Posts: 70

Hi ,

Third party softwares like Veritas is above the budget of the project and also since IBM clearly mentioned in the doc that CLWLPRTY is used for Failover then we just went for it .

We initially did a POC but had missed to test them thoroughly and we have ended up with this issue in Prod .

As you asked , the messages do not reach B . The messages travelling during that time gap of failover remains in xmitq and travels only after A is up .

jefflowrey

Posted: Fri Nov 03, 2006 7:17 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

If the budget of the project does not include costs for the requirements of the project, then it is a badly budgetted project or badly written or understood requirements.

That is, if the requirements of the project say "high availability" and "failover", and the budget does not cover real HA software like Veritas and the hardware necessary to run that - then either the budget doesn't meet the requirements, or the requirements are only "suggestions".
_________________
I am *not* the model of the modern major general.

Vitor

Posted: Fri Nov 03, 2006 7:19 am Post subject: Re: Veritas is costly affair

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

In_love_with_MQ wrote:

As you asked , the messages do not reach B . The messages travelling during that time gap of failover remains in xmitq and travels only after A is up .

The curse of the stuck messages.

I thought for one glorious moment it was fixed in the new release.

I do wish IBM had used a different word than "failover" when describing the new clustering parameters.

There might be some mileage in raising a PMR with IBM. At worst, you can suggest they use a less emotive & misleading word...

_________________
Honesty is the best policy.
Insanity is the best defence.

Last edited by Vitor on Fri Nov 03, 2006 7:23 am; edited 1 time in total

Vitor

Posted: Fri Nov 03, 2006 7:21 am Post subject: Re: Veritas is costly affair

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

In_love_with_MQ wrote:

....since IBM clearly mentioned in the doc that CLWLPRTY is used for Failover then we just went for it .

Gotta admire the gung ho spirit though!

_________________
Honesty is the best policy.
Insanity is the best defence.

In_love_with_MQ

Posted: Fri Nov 03, 2006 7:35 am Post subject: I think in that case Ibm is to be blamed

Acolyte

Joined: 10 Jul 2005
Posts: 70

Then I think IBM needs to give proper documentations .
I think we were fooled . Or became fools ourselves ......

Now our headache starts now to solve this issue ...But what ever it is .
I tried the BATCH hb ALSO and even then i can see some messages stuck in .

" CLWLPRTY IS NOT FOR FAILOVER" ::: LESSON LEARNT

I am gonna sue IBM

Vitor

Posted: Fri Nov 03, 2006 8:06 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

IBM have got to be a first port of call.

If I was in the situation you're in (and bear in mind I'm not!) and I was fairly happy with coding I might have a go at the workload exit, code something to distribute traffic away from failed queue managers.

Alternatively you could have something which monitors queue manager health directly and suspends the queue manager from the cluster at the 1st sign of trouble. You'd still get some stuck messages, but it would highlight to the default workload balancer that the queue manager should no longer be used.

Just an idea and not fully thought about as yet. Other posters may have better ideas (and whatever you decide, try it in dev first!!)
_________________
Honesty is the best policy.
Insanity is the best defence.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP