Author |
Message
|
kalam475 |
Posted: Thu Dec 15, 2016 10:47 pm Post subject: MQ HA in REDHAT Cluster Suite |
|
|
Acolyte
Joined: 16 Jan 2015 Posts: 63
|
Hi
I have to setup a MQ setup in a HA cluster (Active/Active). The HA software being REDHAT Cluster suite. After going through the documents and infocenter as a MQ Administrator we have to provide the start & stop scripts to system administrator who will use them to add the resources to resource group.
The resources being
1. Queue manager
2.Floating Ip address
3. Volume group
since Active/Active there would be two queue managers QM1 and QM2. The QM1 is active on machine 1 passive on machine 2 and QM2 active on machine 2 and passive on machine1.
so system admin has to create two resource groups and failover if there is problem.
My question is how system admin knows there is problem with queue manager. Since the queue manager is started as under cluster support does the cluster will monitor PID with which it started the QMGR and failover if PID is stopped?
Also there is monitor script which tells me the state of queue manager does the cluster continuously runs this script and monitors the status of QMGR and if status is changed can it be able to failover.
I am getting hard time understanding how this works in minute level. I know the cluster will take care of floating IP address and disk switching but confused how it will know weather my QMGR is running or not is what i am not able to understand.
I am sure some of you have done this earlier please help me any suggestion is gold here.
 |
|
Back to top |
|
 |
Vitor |
Posted: Fri Dec 16, 2016 5:55 am Post subject: Re: MQ HA in REDHAT Cluster Suite |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
kalam475 wrote: |
I have to setup a MQ setup in a HA cluster (Active/Active). The HA software being REDHAT Cluster suite. |
RHCS doesn't support Active/Active - it support Active/Passive
kalam475 wrote: |
After going through the documents and infocenter as a MQ Administrator we have to provide the start & stop scripts to system administrator who will use them to add the resources to resource group. |
Because only one node is active at a time; the other node is passive and has to be started.
kalam475 wrote: |
The resources being
1. Queue manager
2.Floating Ip address
3. Volume group
|
Yes.
kalam475 wrote: |
since Active/Active there would be two queue managers QM1 and QM2. The QM1 is active on machine 1 passive on machine 2 and QM2 active on machine 2 and passive on machine1. |
No there wouldn't. You're talking about 2 RHCS resource groups overlapping on 2 RH nodes, i.e. achieving active/active by having 2 active/passive groups overlapping. If RHCS allows that (which I doubt) you'll tie yourself in knots trying to make it work.
kalam475 wrote: |
system admin has to create two resource groups and failover if there is problem. |
Yeah - doesn't that sound simple? Work through some use cases and see how quickly this collapses under you.
kalam475 wrote: |
My question is how system admin knows there is problem with queue manager. Since the queue manager is started as under cluster support does the cluster will monitor PID with which it started the QMGR and failover if PID is stopped? |
It does whatever you tell RHCS to do. The health of a queue manager is determined by more than one PID in my view.
kalam475 wrote: |
Also there is monitor script which tells me the state of queue manager does the cluster continuously runs this script and monitors the status of QMGR and if status is changed can it be able to failover. |
Like I said, RHCS does what you tell it to do.
kalam475 wrote: |
I am getting hard time understanding how this works in minute level. I know the cluster will take care of floating IP address and disk switching but confused how it will know weather my QMGR is running or not is what i am not able to understand. |
Because, as the MQ administrator, you've determined the criteria for queue manager health. Be aware, and this is the proverbial kicker, that failover can occur for more reasons than just an MQ problem.
kalam475 wrote: |
I am sure some of you have done this earlier please help me any suggestion is gold here. |
We use RHCS for HA here. My golden suggestion - don't do this. Use RHCS for Active/Passive like it's built for. If you need MQ availability while the node is failing over, there are a number of better ways to achieve it. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
kalam475 |
Posted: Sun Dec 18, 2016 9:57 pm Post subject: |
|
|
Acolyte
Joined: 16 Jan 2015 Posts: 63
|
Thanks Vitor for your valuable inputs.
Quote: |
No there wouldn't. You're talking about 2 RHCS resource groups overlapping on 2 RH nodes, i.e. achieving active/active by having 2 active/passive groups overlapping. If RHCS allows that (which I doubt) you'll tie yourself in knots trying to make it work. |
I always thought this is how Active/Active is achieved. May be i am not thinking on right lines. Can you just outline how active/active is acheived. Even if it is not on RHCS.
Quote: |
If you need MQ availability while the node is failing over, there are a number of better ways to achieve it. |
Apart from HA active/active and active/passive we have multi-instance queue manager as a option to failover. Is there any other way of achieving high availability.
Quote: |
We use RHCS for HA here |
since you have done this can you just out line how the RHCS would know there is a problem with QMGR and failover to the passive instance.
Any specific link to read and understand the same.Thanks again  |
|
Back to top |
|
 |
smdavies99 |
Posted: Sun Dec 18, 2016 10:55 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
kalam475 wrote: |
since you have done this can you just out line how the RHCS would know there is a problem with QMGR and failover to the passive instance.
|
you first have to define what 'a problem with the QMGR' actually means.
Let me save you the trouble, the answer is 'it depends'
Depends upon what?
It depends.
It could be a Channel stopping
It could be a poison message
It could be a queue filling up
It could be a disk array going TITSUP.
See it depends.
Some of the above while still ranked as problem with the QMGR are NOT sufficiently serious to cause a failover while some won't get solved by failing it over.
Us know the risks etc of using RHCS or MCSC or whatever. We also know and can evaluate the seriouness of a problem pretty quickly.
This can't really be taught. You learn it over time.
I get the feeling that you have been dropped into the deep do-do from a great height without a parachute and are at the moment flapping around looking for a cruise liner to come rescue you when it will more likely be a row boat.
If your company does not have this sort of solution (RHCS etc) in place already then frankly you are a hiding to nothing without lots of expert help. We can't be expected to provide that to you free gratis.
Do you have competent RH admins? Are they skilled in RHCS?
I'm certified to RHCE and RHCA levels and as my sig says been using MQ since 1999, but would not take on a job like this without asking lots of probing questions first.
I'd expect anyone brought in to help you to do the same. What happens when they leave? What happens when you move on? Do the skills that you learn go with you.
etc etc etc etc
Sorry for the monday morning rant but I feel that you are way out of your depth here. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kalam475 |
Posted: Sun Dec 18, 2016 11:41 pm Post subject: |
|
|
Acolyte
Joined: 16 Jan 2015 Posts: 63
|
We do have RHCS experts who have done HA clusters regularly. But nobody have done MQ HA on RHCS.
Before going to them with my approach I just wanted to tap into your experience where it might go wrong
In our scenario we have 6 different client applications connecting to our QMGR. All of them connecting through WebSphere Application Server either by queue connection factory to put the messages and Activation specification which triggers the MDB to do a database transaction.
Based on the above Info If I monitor Queue manager events (start/Stop/Inhibit etc..) and hardware failures for failover is it going to be sufficient?
I guess no need to switchover if SVRCONN channel , queue depth and log files disk space like problem arises since its practically same queue manager sharing the same volume. Correct me if i am wrong here?
Again any suggestion you people give is gold dust. THANKS. |
|
Back to top |
|
 |
smdavies99 |
Posted: Mon Dec 19, 2016 12:55 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
kalam475 wrote: |
We do have RHCS experts who have done HA clusters regularly. But nobody have done MQ HA on RHCS.
Before going to them with my approach I just wanted to tap into your experience where it might go wrong
|
So this is as much a learning excericse for them as it is you. Why don't you join forces and setup a POC environment where you can simulate all these failures and learn how to recognise them for yourself (and the Admins). In my experience you will learn more by some experimentation that will be appropriate for your envirnment than relying solely upon us 'strangers on the internet'.
By all means come back with specific questions from your testing. These will be easier to answer. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
Vitor |
Posted: Mon Dec 19, 2016 6:10 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
kalam475 wrote: |
We do have RHCS experts who have done HA clusters regularly. But nobody have done MQ HA on RHCS. |
Then if these people claim you can do Active/Active on RHCS, congratulate them & hide under something. I bet they've done it with something like WAS which is a much easier proposition.
kalam475 wrote: |
Before going to them with my approach I just wanted to tap into your experience where it might go wrong |
Take a seat. I'll list them.....
kalam475 wrote: |
Based on the above Info If I monitor Queue manager events (start/Stop/Inhibit etc..) and hardware failures for failover is it going to be sufficient? |
I would not have said so but it's your system.
kalam475 wrote: |
I guess no need to switchover if SVRCONN channel , queue depth and log files disk space like problem arises since its practically same queue manager sharing the same volume. Correct me if i am wrong here? |
You're wrong. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|