MQSeries.net :: View topic - Disastery Recovery for Message Broker and MQ

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Disastery Recovery for Message Broker and MQ

Disastery Recovery for Message Broker and MQ

« View previous topic :: View next topic »

Author

Message

echoesian

Posted: Tue Sep 09, 2008 8:28 am Post subject: Disastery Recovery for Message Broker and MQ

Apprentice

Joined: 30 May 2008
Posts: 33

All of Master, Guru, MQ Experts I need your help and advices.

I have a Active-Active MQ Cluster with 2 brokers running on the Production with HACMP for failover purpose. All the QMs are clustered. The QMs are:

App (Windows) : APP1 (PR)
Broker1 (AIX) : MQP1 (FR)
Broker2 (AIX) : MQP2 (PR)
Backend (z/OS) : HOST (FR)

Cluster Name : EAICLUSTER

A DR server for the both Brokers with the QMs are required. The options are:

1) DR QMs Clustered together with Production QMs
- With this method, the DR is setup with the identical settings with different QMs and Brokers' Name. The DR QMs and Brokers are always offline and only will be manually turn on if disaster happened. BUT, maroon messages cannot be processed!

2) Data Replication with same QMs Name
- In this method, in the event of disaster, filesystem for the MQ on both QMs will be replicated over to the DR shared disks. DR will mount this filesystem and make available for the MQ to start both of the QMs and Brokers. Since the IP address is different now, the cluster channel connection names need to be changed in order for the APPS and the DR Mainframe to connect to the Brokers' QMs.

With option 1, it is definite we can solve the maroon messages, but with option 2, there are risks involved when manually executing scripts to update the connection names.

Appreciates your comments and advices please.

PeterPotkay

Posted: Tue Sep 09, 2008 9:09 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Option 2 is going to turn into a giant mess. Do not have 2 QMs with the same name in an MQ cluster. Ever! Even if they don't come up at the same time!

Is your your DR data center close enough so that your hardware cluster can introduce a 3rd server with syncronous replication to accept the disks, QMs, Brokers and the VIP from Servers #1 and #2 when they blow up?

Option 1 - don't discount that. How often do you have messages sitting on the queues? Usually queues are empty as stuff zooms through. Is it worth making a solution much much more complicated to save one or 2 messages when a real disaster strikes? Are you sure you wouldn't rather have a simpler solution that you know works well and who cares about one dumb message when the building is gone? No reason not to have the DR servers and QMs up and running all the time. Just uncluster the app queues so no work goes to them. Meanwhile your test queues can be clustered so that you can run your validaction traffic anytime you want. Now you know that your DR infrastructure is up and working 100%. And if your apps step up to the plate and have live instances in both datacenters you are loving life. Span the MQ cluster across data centers and load balance the work via MQ clustering. When one side is gone you don't have to worry as all the work automagically flops over to the remaining side.

Another point regarding the marooned messages - if you are dealing with asynchronous replication between the data centers, you will have missing messages when DR strikes. There is no way around this fact. Period. So as soon as you accept that fact, Option 1 becomes a much more appealing design.

Critical Data that can't be lost? Use a DB, that's what they are for. Super important MQ messages? Guess what, the apps better have double and triple checks and balances built into the design anyway so they can tolerate a missing MQ message.
_________________
Peter Potkay
Keep Calm and MQ On

echoesian

Posted: Tue Sep 09, 2008 5:35 pm Post subject:

Apprentice

Joined: 30 May 2008
Posts: 33

Hi, Peter thanks for your feedback. Because the client is a Bank, they wanted the message to be replicated over because they worry the maroon messages will be processed once the server is alive and double credit to accounts might happen.

Btw, the DR is located within the city about just 50km away.

So, you recommend to cluster the DR Broker together with the Production Brokers in a 3 nodes MQ Active-Active setup? But certain scenarios cannot be achieved. The client wanted something like component/servers failover instead of only disaster happened physically on building.

Actually the client wanted to have something like this:

DR Scenario 1
Production EAI : Down
Production Mainframe: Up

Action
DR EAI : Up
Production Mainframe: Up
* If production EAI is down, the DR will bring up but need to point to the Production Mainframe, hence connection names need to change

DR Scenario 2
Production EAI : Up
Production Mainframe: Down

Action
DR EAI : Down
DR Mainframe: Up
* In this scenario, the mainframe of DR will bring up, the Production EAI need to change the connection names in order to point to the DR settings

DR Scenario 3
Production EAI : Down
Production Mainframe: Down

Action
* Both servers will swing over to the DR. Hence, the connection names need to change also to point to the DR Mainframe

PeterPotkay

Posted: Tue Sep 09, 2008 5:56 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

echoesian wrote:

...they wanted the message to be replicated over because they worry the maroon messages will be processed once the server is alive and double credit to accounts might happen....

Btw, the DR is located within the city about just 50km away.

DR only 50km away? OK. Not the best situation. What happens when a hurricane or ice storm knocks out power for the entire region for 2 weeks? But I digress...

Since the DR site IS so close, you can do MAN with synchronous replication between both datacenters. You should have 2 node clusters stretched between the 2 sites, 1 server of each cluster pair in each datacenter. If either side goes down, the QM, Broker, DB, VIP and Active disks come up at the other site automatically and with no loss of persistent comitted MQ messages. Anything else surviving or failing over needs no changes since the VIP stays the same.

Regarding your 3 scenarios I really wasn't able to follow. But having your DR site so close to your Production site allow for a MAN to span the 2 and stretch clustering to be implemented. It really makes DR a lot easier for you.
_________________
Peter Potkay
Keep Calm and MQ On

echoesian

Posted: Tue Sep 09, 2008 7:13 pm Post subject:

Apprentice

Joined: 30 May 2008
Posts: 33

Hi Peter, what is MAN? Sorry, I'm not sure about this term. Can you elaborate more please? The thing is the DR has its own shared disks (EMC) that the Production shared disks will mirror to.

fjb_saper

Posted: Tue Sep 09, 2008 7:50 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20771
Location: LI,NY

As Peter alluded to, Financial institutions typically do not use MQ clustering for DR (it is considered the poor man's back up plan) but use some form of HA hardware clustering. I suppose that MAN falls into that type of category.

Enjoy

_________________
MQ & Broker admin

exerk

Posted: Tue Sep 09, 2008 11:16 pm Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

echoesian wrote:

Hi Peter, what is MAN? Sorry, I'm not sure about this term. Can you elaborate more please?...

Metropolitan Area Network

The IEEE 802-2001 standard describes a MAN as being:

â€œA MAN is optimized for a larger geographical area than a LAN, ranging from several blocks of buildings to entire cities. MANs can also depend on communications channels of moderate-to-high data rates. A MAN might be owned and operated by a single organization, but it usually will be used by many individuals and organizations. MANs might also be owned and operated as public utilities. They will often provide means for internetworking of local networks. Metropolitan area networks can span up to 50km, devices used are modem and wire/cable."
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

echoesian

Posted: Wed Sep 10, 2008 1:14 am Post subject:

Apprentice

Joined: 30 May 2008
Posts: 33

Now, I hope that the client will agree to the extend that DR is actually serving "disaster" recovery purpose rather than servers/components fail-over. Because, if the EAI boxes need to be swinging here and there, it will create more complications...

If I were to implement the data replication method with using host name for the cluster receiver and sender connection name instead of IP address, would anyone has such experiences?

Since the DR only have 1 box, Production has 2 boxes running active-active, when replicated over to DR, no changes required on the MQ because the channels are using hostname instead of IP address. So, what I need to do is to configure the 2 nodes with different ports, say 1414 and 1415 with hostname: EAI1(192.168.0.1) and EAI2(192.168.0.2). When failing over, it will be still using the configuration, but what I need to do is just configure both hostnames pointing to the same IP i.e 192.168.1.1 in DR. Is this going to work?

echoesian

Posted: Wed Sep 10, 2008 8:46 am Post subject:

Apprentice

Joined: 30 May 2008
Posts: 33

Masters..... pls help....

fjb_saper

Posted: Wed Sep 10, 2008 8:52 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20771
Location: LI,NY

echoesian wrote:

Agreed

echoesian wrote:

If I were to implement the data replication method with using host name for the cluster receiver and sender connection name instead of IP address, would anyone has such experiences?

We never use IP if we can help it. DNS with hostname is the way to go.
Helps solve a lot more problems.

echoesian wrote:

Since the DR only have 1 box, Production has 2 boxes running active-active, when replicated over to DR, no changes required on the MQ because the channels are using hostname instead of IP address. So, what I need to do is to configure the 2 nodes with different ports, say 1414 and 1415 with hostname: EAI1(192.168.0.1) and EAI2(192.168.0.2). When failing over, it will be still using the configuration, but what I need to do is just configure both hostnames pointing to the same IP i.e 192.168.1.1 in DR. Is this going to work?

Now I am getting confused. Seems that this question has little to do with DR but more to do with your HA managed qmgrs...

You have 2 qmgrs in HA (active / active) and they could potentially be both active on the same side. So YES you will need 2 different ports (one for each qmgr) and no you should just use conname('hostname(port)') and let whatever mechanism is running your HA figure out how to forward to the correct host

Enjoy

_________________
MQ & Broker admin

echoesian

Posted: Wed Sep 10, 2008 10:46 pm Post subject:

Apprentice

Joined: 30 May 2008
Posts: 33

Does anyone know how to setup two different hostnames but pointing to the same IP in AIX pls??

David.Partridge

Posted: Wed Sep 10, 2008 11:08 pm Post subject:

Master

Joined: 28 Jun 2001
Posts: 249

DNS Alias - talk to your DNS/BIND expert
_________________
Cheers,
David C. Partridge

simyobs

Posted: Mon Sep 15, 2008 8:49 am Post subject:

Newbie

Joined: 15 Sep 2008
Posts: 5

Hi echoesian

Understand that you have a Active-Active MQ Cluster, which means that all nodes are running Queue Managers which are active participants in the WMQ Cluster and each node can be a stand-by for one of the other nodes.

As I'm in process of mq cluster design by putting it all together combining WMQ Clustering and HA Clustering.

I need some clarification on Cluster Name : EAICLUSTER
1. Is App (Windows) : APP1 (PR) your Gateway.
2. How does other AppQmgr connect to EAICluster.
3. Is DNS required to be define in conname('hostname(port)')

The approach is to have a continuous availability with MQCluster and HA cluster.

Appreciates your advice...

simyobs

echoesian

Posted: Sun Oct 05, 2008 7:08 pm Post subject:

Apprentice

Joined: 30 May 2008
Posts: 33

1. APP1 is not a gateway, it is in the EAICLUSTER
2. APP1 is in the cluster
3. DNS is done in network level not in MQ, the connname should define like SERVER1(1414) instead of 172.28.3.23(1414)

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Clustering » Disastery Recovery for Message Broker and MQ

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP