Author |
Message
|
purcho |
Posted: Wed Jul 08, 2015 10:21 pm Post subject: Multi Instance MQ Manager Failover speed |
|
|
Newbie
Joined: 08 Jul 2015 Posts: 2
|
I am prototyping a new MQ environment hosted in AWS utilising their new product Elastic File Store (EFS) as an NFSv4 storage to create a multi-instance queue manager.
I have setup the 2 servers in each availability zone and run through MQ file system checks and they have passed successfully. I have also progressed and tested swapping the primary MQ server between them and hard fail over through shutting down the ec2 instance.
As this is the first time i have worked with multi-instance queue managers, i didn't know what to expect in terms of fail over time. I am currently experiencing about 1-2mins for the standby server to detect that the active server is down and then start the server. However tests using the file system testing tools and manually invoking the swap takes 1-2 seconds.
Is there any configuration parameter that governs how long this fail over takes that i should be looking at? Or is this fail over governed by the NFS provider? |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jul 09, 2015 5:03 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
The failover is based on the secondary instance detecting that the file locks on the relevant resources have been released.
The file system checks may/probably explicitly release these locks, where otherwise the secondary system has to wait for them to expire.
This could be related to the NFS provider. There may be some discussion in the KC about tuning this. You should also talk to your NFS admin (unless it's you...)
You might also see what happens if you pull the network cord/otherwise remove one server from the network. This will ensure that you haven't missed some MQ process when trying to do a hard stop. Although maybe stopping the ec2 instance does this as well. |
|
Back to top |
|
 |
purcho |
Posted: Thu Jul 09, 2015 11:13 pm Post subject: |
|
|
Newbie
Joined: 08 Jul 2015 Posts: 2
|
Thanks mqjeff, That does help explain this.
Unfortunately as Amazon are my NFS provider I don't have an NFS admin I can ask.
I have done some further reading on NFSv4 and it does indeed seem like this is likely something configuration driven on the NFS host. From what I understand this configuration is basically a trade off between recovery time, releasing unused memory and preventing unnecessary errors for releasing locks early from minor network inconsistency. |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Jul 10, 2015 5:43 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
purcho wrote: |
Unfortunately as Amazon are my NFS provider I don't have an NFS admin I can ask. |
Even if your account is completely free, you should still be able to send a note to someone, asking what options you have? |
|
Back to top |
|
 |
Gaya3 |
Posted: Fri Jul 10, 2015 7:05 am Post subject: |
|
|
 Jedi
Joined: 12 Sep 2006 Posts: 2493 Location: Boston, US
|
there is a file system lock testing code that you can get it from IBM, i do not remember the site, this will help you to test your NFS.
I remember there are some problem with the NFS V3 file lock release, _________________ Regards
Gayathri
-----------------------------------------------
Do Something Before you Die |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Jul 10, 2015 7:11 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Gaya3 wrote: |
there is a file system lock testing code that you can get it from IBM, i do not remember the site, this will help you to test your NFS. |
that should be the file system tests that come with MQ.
Gaya3 wrote: |
I remember there are some problem with the NFS V3 file lock release, |
NFSv3 does not support or allow file locks. |
|
Back to top |
|
 |
vennela |
Posted: Wed Jul 29, 2015 3:03 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Quote: |
Is there any configuration parameter that governs how long this fail over takes that i should be looking at? |
I have the same question
Can we have MQ failover to the secondary box, in case of Multi-Instance delayed for two minutes at MQ level
Is there some property |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jul 29, 2015 4:47 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
vennela wrote: |
Can we have MQ failover to the secondary box, in case of Multi-Instance delayed for two minutes at MQ level |
It's not delayed at the MQ level. As (very concisely) explained by my most worthy associate, MQ is waiting for NFS to report that the locks have been released to fail over.
vennela wrote: |
Is there some property |
Not at the MQ level. There's no FAILOVER=120 that you can change to FAILOVER=30 because the secondary instance can only detect that it needs to fail over by means of the file lock; there's no other communication between the instances.
Again, as indicated above, any property would be at the NFS level to make the locks expire / release / curl up and die faster. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|