ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue

Post new topic  Reply to topic
 CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue « View previous topic :: View next topic » 
Author Message
In_love_with_MQ
PostPosted: Fri Nov 03, 2006 5:44 am    Post subject: CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue Reply with quote

Acolyte

Joined: 10 Jul 2005
Posts: 70

Hi ,

We have 2 P590 which has 2 instances of cluster Queue into which messages comes from many Qm managers in the cluster .

On one (A) of them a queue called X is defined with CLWLPRTY AS 9
On another (B)a queue called X is defined with CLWLPRTY AS 8

So all messages flows via A(No issues) . Now when i shutdown A as per design the messages should start flowing immediately via B .

But No , it takes some time for this and messages starts flowing after a while only due to this lots of messages are stuck in the XMITQ of the senders as they have pointings to A .

So why is failover not hapenning instantly ???

Is there a Limitation .......If so then this is a Bug ..Or else am i doing anything wrong
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Nov 03, 2006 6:04 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

<Standard Rant>
I'm not happy about MQ clustering being used for failover; it's primarially for workload balancing and I think high availablility clustering should be done with dedicated software.
</Standard Rant>



Having said that, I suspect the problem is the workload balancing algortham doesn't immediately detect that A has stopped processing messages. When you say "shutdown" do you mean "endmqm" or do you mean "pulled cable out of box"?

In any event I suspect the cluster heatbeat interval (or similar mechanism, I'm guessing at this point) needs to detect that A is unresponsive; the Clusters manual says:
Quote:
WebSphere MQ obtains the priority of queue managers after checking channel status. This means that only accessible queue managers are available for selection, and it allows WebSphere MQ to prioritize, where multiple destinations are available.


I might be inclined to try setting CLWLPRTY on the channel rather than the queue. Or doing failover with dedicated software.

Note: I accept unreservedly that the Clusters manual for v6.0 says these parameters can be used to provide failover (and they even put it in italics). My assertion is that it's not designed for that, the attributes are new for v6 and it's not especially good at it (as this poster seems to have found).

I mean, my car's manual says it can carry 60Kg in the boot. Doesn't say how fast it goes with that much load, and doesn't mean I won't hire a van if I ever need to carry something that weighs that much.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Gaya3
PostPosted: Fri Nov 03, 2006 6:05 am    Post subject: Reply with quote

Jedi

Joined: 12 Sep 2006
Posts: 2493
Location: Boston, US

Hi

When ever the first queue manager (here its A) downs, automatically the Second queuemanager will starts off as per the fail over condition

I suspect it is trying to connect with A for some interval of time
than later it came to know that it is down. after that the fail over condition is working with QM B.

So if you have set some heart beat interval, disconnect interval of the channel

Thanks and Regards
Gaythri
_________________
Regards
Gayathri
-----------------------------------------------
Do Something Before you Die
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Nov 03, 2006 6:22 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Gaya3 wrote:
So if you have set some heart beat interval, disconnect interval of the channel



Isn't that what I said......?
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Gaya3
PostPosted: Fri Nov 03, 2006 6:25 am    Post subject: Reply with quote

Jedi

Joined: 12 Sep 2006
Posts: 2493
Location: Boston, US

Hi Vitor

Yes exactly, we both worked on this issue at the same time i believe

Gayathri
_________________
Regards
Gayathri
-----------------------------------------------
Do Something Before you Die
Back to top
View user's profile Send private message
happyj
PostPosted: Fri Nov 03, 2006 6:31 am    Post subject: Reply with quote

Voyager

Joined: 07 Feb 2005
Posts: 87

If I understand this right, the messages will still go to A until the sending queue manager knows that the channel is not active. They will then go to B. The channel status can be determined by using the BATCHHB on the channel.
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Nov 03, 2006 6:47 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

That's certainly what I suspect to be happening. Now to avoid this you need to detect change in channel status which is why I suggested using the channel priority; if the xmitq to A is filling up the workload balancer might be inclined to start routing to B. It's a question for better minds than mine how quickly it will do that & how configurable it is.

To get instant failover I fear you'd need the heartbeat interval set so high message throughput would suffer (note this is a fear not a fact). Another way round this would be to write a workload balancing exit that (by fair means or foul) determined the loss of a queue manager and rerouted traffic. Not a trivial task, and opens up a raft of maintainability / support / efficientcy issues.

Better to buy high availability software! <StandardRant/>
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Nov 03, 2006 6:49 am    Post subject: Re: CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

In_love_with_MQ wrote:
But No , it takes some time for this and messages starts flowing after a while only due to this lots of messages are stuck in the XMITQ of the senders as they have pointings to A .


Am I reading this correctly and no messages are getting stuck and unprocessed on A? Everything ends up on B, albeit after a delay?
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
In_love_with_MQ
PostPosted: Fri Nov 03, 2006 7:04 am    Post subject: Veritas is costly affair Reply with quote

Acolyte

Joined: 10 Jul 2005
Posts: 70

Hi ,

Third party softwares like Veritas is above the budget of the project and also since IBM clearly mentioned in the doc that CLWLPRTY is used for Failover then we just went for it .

We initially did a POC but had missed to test them thoroughly and we have ended up with this issue in Prod .

As you asked , the messages do not reach B . The messages travelling during that time gap of failover remains in xmitq and travels only after A is up .
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Fri Nov 03, 2006 7:17 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

If the budget of the project does not include costs for the requirements of the project, then it is a badly budgetted project or badly written or understood requirements.

That is, if the requirements of the project say "high availability" and "failover", and the budget does not cover real HA software like Veritas and the hardware necessary to run that - then either the budget doesn't meet the requirements, or the requirements are only "suggestions".
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Nov 03, 2006 7:19 am    Post subject: Re: Veritas is costly affair Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

In_love_with_MQ wrote:
As you asked , the messages do not reach B . The messages travelling during that time gap of failover remains in xmitq and travels only after A is up .


The curse of the stuck messages.

I thought for one glorious moment it was fixed in the new release.

I do wish IBM had used a different word than "failover" when describing the new clustering parameters.

There might be some mileage in raising a PMR with IBM. At worst, you can suggest they use a less emotive & misleading word...
_________________
Honesty is the best policy.
Insanity is the best defence.


Last edited by Vitor on Fri Nov 03, 2006 7:23 am; edited 1 time in total
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Nov 03, 2006 7:21 am    Post subject: Re: Veritas is costly affair Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

In_love_with_MQ wrote:
....since IBM clearly mentioned in the doc that CLWLPRTY is used for Failover then we just went for it .


Gotta admire the gung ho spirit though!
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
In_love_with_MQ
PostPosted: Fri Nov 03, 2006 7:35 am    Post subject: I think in that case Ibm is to be blamed Reply with quote

Acolyte

Joined: 10 Jul 2005
Posts: 70

Then I think IBM needs to give proper documentations .
I think we were fooled . Or became fools ourselves ......

Now our headache starts now to solve this issue ...But what ever it is .
I tried the BATCH hb ALSO and even then i can see some messages stuck in .

" CLWLPRTY IS NOT FOR FAILOVER" ::: LESSON LEARNT

I am gonna sue IBM
Back to top
View user's profile Send private message
Vitor
PostPosted: Fri Nov 03, 2006 8:06 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

IBM have got to be a first port of call.

If I was in the situation you're in (and bear in mind I'm not!) and I was fairly happy with coding I might have a go at the workload exit, code something to distribute traffic away from failed queue managers.

Alternatively you could have something which monitors queue manager health directly and suspends the queue manager from the cluster at the 1st sign of trouble. You'd still get some stuck messages, but it would highlight to the default workload balancer that the queue manager should no longer be used.

Just an idea and not fully thought about as yet. Other posters may have better ideas (and whatever you decide, try it in dev first!!)
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » CLWLPRTY FAILOVER TAKES SOME TIME : Urgent issue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.