Author |
Message
|
HenriqueS |
Posted: Fri Jul 10, 2009 4:01 pm Post subject: MQ on virtual machines to provide some failover |
|
|
 Master
Joined: 22 Sep 2006 Posts: 235
|
Hello folks,
Where can I find some pointers on installing MQ on a virtual machine? My infrastructure team has offered using a Linux active-passive 'heartbeat' solution. Storage is mirroed with our DR site and will accomodate the full software stack plus MQ data files.
The idea is if the main node fails, a backup server will boot up on the DR site and will boot up acoording to the last state found on storage.
Any special care needs to be taken upon installation? Or can install MQ like if I am dealing with a regular server hardware/software?
I fear somewhat about MQ log files. They can be corrupted when the new server is booted up and MQ tries to recover from them? |
|
Back to top |
|
 |
exerk |
Posted: Sat Jul 11, 2009 12:47 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
One obvious question - does the stand-by hardware boot with the same IP Address of the failed server?
As far as I am aware, 'rebooting' an image on different identical hardware is no different to rebooting a physical server, so long as DNS/IP issues are considered. Hopefully, someone with VM experience will be long in a minute... _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
Vitor |
Posted: Sat Jul 11, 2009 1:45 am Post subject: Re: MQ on virtual machines to provide some failover |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
HenriqueS wrote: |
Any special care needs to be taken upon installation? Or can install MQ like if I am dealing with a regular server hardware/software? |
You install WMQ on a virtual machine like you do on a physical one. Pay special attention to the "machine" configuration and ensure that there's enough memory allocated by the VM at all times to meet the requirements.
HenriqueS wrote: |
I fear somewhat about MQ log files. They can be corrupted when the new server is booted up and MQ tries to recover from them? |
Why? Don't you trust the mirroring solution? Are the database people worried as well?
What you do need to worry about is the proper placement and linking of files on shared / mirrored disc. There are support pacs, with scripts to do this, for all the popular Unix HA solutions. Review these for helpful information. Once you've finished the fundamentals manual & got clustering squared away. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Sat Jul 11, 2009 5:20 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
How is the log data going to get to the DR site? If the primary server is updating the primary SAN and DR SAN location synchronously, you probably have no issues.
Odds are your DR site is to far away for synchronous updates. Meaning the SAN frames are going to be replicating the data asynchronously. Meaning there are going to be a best seconds / minutes of lag time and at worst hours maybe even days. There are a bunch of issues with that. The async replication doesn't care where your QM is in the checkpoint process. But I'm guessing the QM will be able to recover from that just the same as if you pulled the power cord in the middle of checkpoint processing on a regular QM. Although if there are hours of lag time.....
Additionally you will deal with missing messages (the MQPUT occurred but was not yet replicated) and duplicate messages (the MQGET occurred but was not yet replicated). _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
HenriqueS |
Posted: Sat Jul 11, 2009 7:17 am Post subject: |
|
|
 Master
Joined: 22 Sep 2006 Posts: 235
|
Yes, the 'takeover' process takes care of this. The stand-by machine boots with same IP and DNS entry.
exerk wrote: |
One obvious question - does the stand-by hardware boot with the same IP Address of the failed server?
As far as I am aware, 'rebooting' an image on different identical hardware is no different to rebooting a physical server, so long as DNS/IP issues are considered. Hopefully, someone with VM experience will be long in a minute... |
|
|
Back to top |
|
 |
HenriqueS |
Posted: Sat Jul 11, 2009 7:27 am Post subject: |
|
|
 Master
Joined: 22 Sep 2006 Posts: 235
|
I can ask the Operating Systems team to check that. As far I know the DR site lays around 5km from us (3.12 miles - actually people argue this is too close, it should not be called a 'DR' site).
What I know is that there is 'synchronous' updates between these sites but I wil ask with how much delay they are dealing.
One thing they told is that they are switching the heartbeat signaling. Currently the hypervisor operating system uses ICMP packets to check if the primary node failed, they told me they are going for some faster signaling method which envolves SAN writes/reads, so the storage infrastructure signals the hypervisor s.o. to boot up the secondary node.
Like you said, my worry is about these 2 situations:
-the MQPUT occurred but was not yet replicated.
-the MQGET occurred but was not yet replicated.
If we start having high loads on this server, this issues may arise...right now if think it won´t happen.
Well, in a few days they will be providing the virtual machine for me. I did choose a Linux. I am considering installing MQ and perform some tests, like sending massive PUTs and just unplugging the primary node.
PeterPotkay wrote: |
How is the log data going to get to the DR site? If the primary server is updating the primary SAN and DR SAN location synchronously, you probably have no issues.
Odds are your DR site is to far away for synchronous updates. Meaning the SAN frames are going to be replicating the data asynchronously. Meaning there are going to be a best seconds / minutes of lag time and at worst hours maybe even days. There are a bunch of issues with that. The async replication doesn't care where your QM is in the checkpoint process. But I'm guessing the QM will be able to recover from that just the same as if you pulled the power cord in the middle of checkpoint processing on a regular QM. Although if there are hours of lag time.....
Additionally you will deal with missing messages (the MQPUT occurred but was not yet replicated) and duplicate messages (the MQGET occurred but was not yet replicated). |
|
|
Back to top |
|
 |
PeterPotkay |
Posted: Sat Jul 11, 2009 9:20 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
HenriqueS wrote: |
I can ask the Operating Systems team to check that. As far I know the DR site lays around 5km from us (3.12 miles - actually people argue this is too close, it should not be called a 'DR' site).
What I know is that there is 'synchronous' updates between these sites but I wil ask with how much delay they are dealing.
|
A site only 3 miles away is not a real DR site, but its better than nothing.
If the other site is only 3 miles away, there's no excuse not to have fibre running between both sites and having your node synchronously updating both SAN frames. This is called a stretch cluster. It provides H.A. and DR as good as DR only 3 miles away can be considered. Just hope that when the tornado or ice storm or Godzilla or flood hits, it only impacts one of your data centers and not the other.  _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
|