Author |
Message
|
echoesian |
Posted: Tue Sep 09, 2008 8:28 am Post subject: Disastery Recovery for Message Broker and MQ |
|
|
Apprentice
Joined: 30 May 2008 Posts: 33
|
All of Master, Guru, MQ Experts I need your help and advices.
I have a Active-Active MQ Cluster with 2 brokers running on the Production with HACMP for failover purpose. All the QMs are clustered. The QMs are:
App (Windows) : APP1 (PR)
Broker1 (AIX) : MQP1 (FR)
Broker2 (AIX) : MQP2 (PR)
Backend (z/OS) : HOST (FR)
Cluster Name : EAICLUSTER
A DR server for the both Brokers with the QMs are required. The options are:
1) DR QMs Clustered together with Production QMs
- With this method, the DR is setup with the identical settings with different QMs and Brokers' Name. The DR QMs and Brokers are always offline and only will be manually turn on if disaster happened. BUT, maroon messages cannot be processed!
2) Data Replication with same QMs Name
- In this method, in the event of disaster, filesystem for the MQ on both QMs will be replicated over to the DR shared disks. DR will mount this filesystem and make available for the MQ to start both of the QMs and Brokers. Since the IP address is different now, the cluster channel connection names need to be changed in order for the APPS and the DR Mainframe to connect to the Brokers' QMs.
With option 1, it is definite we can solve the maroon messages, but with option 2, there are risks involved when manually executing scripts to update the connection names.
Appreciates your comments and advices please. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Sep 09, 2008 9:09 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Option 2 is going to turn into a giant mess. Do not have 2 QMs with the same name in an MQ cluster. Ever! Even if they don't come up at the same time!
Is your your DR data center close enough so that your hardware cluster can introduce a 3rd server with syncronous replication to accept the disks, QMs, Brokers and the VIP from Servers #1 and #2 when they blow up?
Option 1 - don't discount that. How often do you have messages sitting on the queues? Usually queues are empty as stuff zooms through. Is it worth making a solution much much more complicated to save one or 2 messages when a real disaster strikes? Are you sure you wouldn't rather have a simpler solution that you know works well and who cares about one dumb message when the building is gone? No reason not to have the DR servers and QMs up and running all the time. Just uncluster the app queues so no work goes to them. Meanwhile your test queues can be clustered so that you can run your validaction traffic anytime you want. Now you know that your DR infrastructure is up and working 100%. And if your apps step up to the plate and have live instances in both datacenters you are loving life. Span the MQ cluster across data centers and load balance the work via MQ clustering. When one side is gone you don't have to worry as all the work automagically flops over to the remaining side.
Another point regarding the marooned messages - if you are dealing with asynchronous replication between the data centers, you will have missing messages when DR strikes. There is no way around this fact. Period. So as soon as you accept that fact, Option 1 becomes a much more appealing design.
Critical Data that can't be lost? Use a DB, that's what they are for. Super important MQ messages? Guess what, the apps better have double and triple checks and balances built into the design anyway so they can tolerate a missing MQ message. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
echoesian |
Posted: Tue Sep 09, 2008 5:35 pm Post subject: |
|
|
Apprentice
Joined: 30 May 2008 Posts: 33
|
Hi, Peter thanks for your feedback. Because the client is a Bank, they wanted the message to be replicated over because they worry the maroon messages will be processed once the server is alive and double credit to accounts might happen.
Btw, the DR is located within the city about just 50km away.
So, you recommend to cluster the DR Broker together with the Production Brokers in a 3 nodes MQ Active-Active setup? But certain scenarios cannot be achieved. The client wanted something like component/servers failover instead of only disaster happened physically on building.
Actually the client wanted to have something like this:
DR Scenario 1
Production EAI : Down
Production Mainframe: Up
Action
DR EAI : Up
Production Mainframe: Up
* If production EAI is down, the DR will bring up but need to point to the Production Mainframe, hence connection names need to change
DR Scenario 2
Production EAI : Up
Production Mainframe: Down
Action
DR EAI : Down
DR Mainframe: Up
* In this scenario, the mainframe of DR will bring up, the Production EAI need to change the connection names in order to point to the DR settings
DR Scenario 3
Production EAI : Down
Production Mainframe: Down
Action
* Both servers will swing over to the DR. Hence, the connection names need to change also to point to the DR Mainframe |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Sep 09, 2008 5:56 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
echoesian wrote: |
...they wanted the message to be replicated over because they worry the maroon messages will be processed once the server is alive and double credit to accounts might happen....
Btw, the DR is located within the city about just 50km away.
|
DR only 50km away? OK. Not the best situation. What happens when a hurricane or ice storm knocks out power for the entire region for 2 weeks? But I digress...
Since the DR site IS so close, you can do MAN with synchronous replication between both datacenters. You should have 2 node clusters stretched between the 2 sites, 1 server of each cluster pair in each datacenter. If either side goes down, the QM, Broker, DB, VIP and Active disks come up at the other site automatically and with no loss of persistent comitted MQ messages. Anything else surviving or failing over needs no changes since the VIP stays the same.
Regarding your 3 scenarios I really wasn't able to follow. But having your DR site so close to your Production site allow for a MAN to span the 2 and stretch clustering to be implemented. It really makes DR a lot easier for you. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
echoesian |
Posted: Tue Sep 09, 2008 7:13 pm Post subject: |
|
|
Apprentice
Joined: 30 May 2008 Posts: 33
|
Hi Peter, what is MAN? Sorry, I'm not sure about this term. Can you elaborate more please? The thing is the DR has its own shared disks (EMC) that the Production shared disks will mirror to. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Sep 09, 2008 7:50 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
As Peter alluded to, Financial institutions typically do not use MQ clustering for DR (it is considered the poor man's back up plan) but use some form of HA hardware clustering. I suppose that MAN falls into that type of category.
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
exerk |
Posted: Tue Sep 09, 2008 11:16 pm Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
echoesian wrote: |
Hi Peter, what is MAN? Sorry, I'm not sure about this term. Can you elaborate more please?... |
Metropolitan Area Network
The IEEE 802-2001 standard describes a MAN as being:
“A MAN is optimized for a larger geographical area than a LAN, ranging from several blocks of buildings to entire cities. MANs can also depend on communications channels of moderate-to-high data rates. A MAN might be owned and operated by a single organization, but it usually will be used by many individuals and organizations. MANs might also be owned and operated as public utilities. They will often provide means for internetworking of local networks. Metropolitan area networks can span up to 50km, devices used are modem and wire/cable." _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
echoesian |
Posted: Wed Sep 10, 2008 1:14 am Post subject: |
|
|
Apprentice
Joined: 30 May 2008 Posts: 33
|
Now, I hope that the client will agree to the extend that DR is actually serving "disaster" recovery purpose rather than servers/components fail-over. Because, if the EAI boxes need to be swinging here and there, it will create more complications...
If I were to implement the data replication method with using host name for the cluster receiver and sender connection name instead of IP address, would anyone has such experiences?
Since the DR only have 1 box, Production has 2 boxes running active-active, when replicated over to DR, no changes required on the MQ because the channels are using hostname instead of IP address. So, what I need to do is to configure the 2 nodes with different ports, say 1414 and 1415 with hostname: EAI1(192.168.0.1) and EAI2(192.168.0.2). When failing over, it will be still using the configuration, but what I need to do is just configure both hostnames pointing to the same IP i.e 192.168.1.1 in DR. Is this going to work? |
|
Back to top |
|
 |
echoesian |
Posted: Wed Sep 10, 2008 8:46 am Post subject: |
|
|
Apprentice
Joined: 30 May 2008 Posts: 33
|
Masters..... pls help.... |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Sep 10, 2008 8:52 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
echoesian wrote: |
Now, I hope that the client will agree to the extend that DR is actually serving "disaster" recovery purpose rather than servers/components fail-over. Because, if the EAI boxes need to be swinging here and there, it will create more complications... |
Agreed
echoesian wrote: |
If I were to implement the data replication method with using host name for the cluster receiver and sender connection name instead of IP address, would anyone has such experiences? |
We never use IP if we can help it. DNS with hostname is the way to go.
Helps solve a lot more problems.
echoesian wrote: |
Since the DR only have 1 box, Production has 2 boxes running active-active, when replicated over to DR, no changes required on the MQ because the channels are using hostname instead of IP address. So, what I need to do is to configure the 2 nodes with different ports, say 1414 and 1415 with hostname: EAI1(192.168.0.1) and EAI2(192.168.0.2). When failing over, it will be still using the configuration, but what I need to do is just configure both hostnames pointing to the same IP i.e 192.168.1.1 in DR. Is this going to work? |
Now I am getting confused. Seems that this question has little to do with DR but more to do with your HA managed qmgrs...
You have 2 qmgrs in HA (active / active) and they could potentially be both active on the same side. So YES you will need 2 different ports (one for each qmgr) and no you should just use conname('hostname(port)') and let whatever mechanism is running your HA figure out how to forward to the correct host
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
echoesian |
Posted: Wed Sep 10, 2008 10:46 pm Post subject: |
|
|
Apprentice
Joined: 30 May 2008 Posts: 33
|
Does anyone know how to setup two different hostnames but pointing to the same IP in AIX pls?? |
|
Back to top |
|
 |
David.Partridge |
Posted: Wed Sep 10, 2008 11:08 pm Post subject: |
|
|
 Master
Joined: 28 Jun 2001 Posts: 249
|
DNS Alias - talk to your DNS/BIND expert _________________ Cheers,
David C. Partridge |
|
Back to top |
|
 |
simyobs |
Posted: Mon Sep 15, 2008 8:49 am Post subject: |
|
|
Newbie
Joined: 15 Sep 2008 Posts: 5
|
Hi echoesian
Understand that you have a Active-Active MQ Cluster, which means that all nodes are running Queue Managers which are active participants in the WMQ Cluster and each node can be a stand-by for one of the other nodes.
As I'm in process of mq cluster design by putting it all together combining WMQ Clustering and HA Clustering.
I need some clarification on Cluster Name : EAICLUSTER
1. Is App (Windows) : APP1 (PR) your Gateway.
2. How does other AppQmgr connect to EAICluster.
3. Is DNS required to be define in conname('hostname(port)')
The approach is to have a continuous availability with MQCluster and HA cluster.
Appreciates your advice...
simyobs |
|
Back to top |
|
 |
echoesian |
Posted: Sun Oct 05, 2008 7:08 pm Post subject: |
|
|
Apprentice
Joined: 30 May 2008 Posts: 33
|
1. APP1 is not a gateway, it is in the EAICLUSTER
2. APP1 is in the cluster
3. DNS is done in network level not in MQ, the connname should define like SERVER1(1414) instead of 172.28.3.23(1414) |
|
Back to top |
|
 |
|