ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » WMQ on Windows and disaster recovery

Post new topic  Reply to topic
 WMQ on Windows and disaster recovery « View previous topic :: View next topic » 
Author Message
jhalstead
PostPosted: Mon May 11, 2009 12:22 pm    Post subject: WMQ on Windows and disaster recovery Reply with quote

Master

Joined: 16 Aug 2001
Posts: 258
Location: London

Hi, don't get to do much Windows stuff, I've seen a few threads that skirt around this topic but nothing that nails it. I've also looked at the doco and the only stuff that comes close are the hamvmqm type commands - however I'm not planning to use MSCS.

Basically ....

In production there will be 2 nodes, Server1 and Server2, both up and running concurrently each with their own queue managers qmgr1 & qmgr2. I'll be separating the log and queue filesystems on different disks on our SAN which will be mirrored to a DR site.

If Server 1 or 2 has an outage we have a period of time to get them and their qmgrs back up, nothing exotic needed. In the event of a disaster we'll need to bring up qmgr1 & qmgr2 on the single DR node - Server3.

Now this wouldn't present much of a problem in the UNIX world - however with the windows implementation using the registry things are a little more complicated.

The best I can think of doing is ..... create dummy instances on the DR node with the same names and attributes as the real production queue managers (qmgr1 & qmgr2) once these have been created, stop them and use hamvmqm to point at a the locations where we'll mount the mirrored log and queue data filesystems as part of disaster recovery (we'll just create some temporary partitions) to enable this. If these start up OK, we'll shut them down and then remove the temporary partitions.

In the event of a Disaster we can then mount the mirrored data and log filesystems, we should then be able to restart the qmgrs listening on the right ports with the objects definitions, state and any persistent messages. We'll manually change DNS such that traffic for the two original servers is directed to this node ....

Does this sound like it might work? Are there any extra things I've missed or better approaches that you'd recommend? This all seems like a bit of hack to me and wander if there is something a little simpler to do.

Thanks in advance
Jamie
Back to top
View user's profile Send private message Send e-mail
sumit
PostPosted: Tue May 12, 2009 1:24 am    Post subject: Re: WMQ on Windows and disaster recovery Reply with quote

Partisan

Joined: 19 Jan 2006
Posts: 398

jhalstead wrote:
.. create dummy instances on the DR node with the same names and attributes as the real production queue managers (qmgr1 & qmgr2) once these have been created, stop them...


Are these queue managers in cluster? If yes, then you need to be a bit cautious while executing these steps as your FR may get 2 different QMIDs for the same queue manager (qmgr1-server1 and qmgr1-DR server)
_________________
Regards
Sumit
Back to top
View user's profile Send private message Yahoo Messenger
exerk
PostPosted: Tue May 12, 2009 1:50 am    Post subject: Re: WMQ on Windows and disaster recovery Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

sumit wrote:
Are these queue managers in cluster? If yes, then you need to be a bit cautious while executing these steps as your FR may get 2 different QMIDs for the same queue manager (qmgr1-server1 and qmgr1-DR server)


Do NOT, EVER, have two queue managers of the same name in a cluster - unless you want a whole world of hurt.

jhalstead, my advice is, if you want to do this, do it properly. As far as I am aware, the hamvmqm command is for MSCS clusters only, not for trying to hack a queue manager across SAN.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
jhalstead
PostPosted: Tue May 12, 2009 2:52 am    Post subject: Reply with quote

Master

Joined: 16 Aug 2001
Posts: 258
Location: London

Guys, thanks for your quick response.

I would never add the "temp" queue manager that I create to the WMQ cluster. The QMID is part of the qmgr data right - it's not stored in the registry is it - could it be the InstanceData->InstanceID key I see?

If we say that my approach is flawed, how would you guys go about bringing up a queue manager from its log and data files on a disaster recovery site?

Thanks for the help!
Back to top
View user's profile Send private message Send e-mail
Vitor
PostPosted: Tue May 12, 2009 3:32 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jhalstead wrote:
If we say that my approach is flawed, how would you guys go about bringing up a queue manager from its log and data files on a disaster recovery site?


It's not flawed and it is an hack.

If you proceed as you suggest then when you restart the DR version of the queue manager with the files of the old one (do not use hamvmqm, just load the files to where the DR queue manager thinks they are) it will think it's the original queue manager, forward recover the messages & log files and get on with it's life.

BUT it will have it's own QMID not the copied one. If the queue manager being copied is a member of a cluster then you're seconds from a loud bang and your cluster burning to the ground.

A much better solution to your problem is to upgrade to WMQv6 (a good idea in any event) and use a backup queue manager located in your DR site. Very much what you're trying to do and has that reassuring out-of-the-box feeling. Clean, supported and not a hack in sight (or site!).

Simples
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue May 12, 2009 5:18 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

How far away is Server 3 from Server 1 and 2?

Why are you NOT using MSCS?

Vitor's suggestion to use a backup QM is a good one, but requires you to use Linear Logging and don't forget its an asynchronous process. Your backup QM could be quite far behind the production one, with the apps having to deal with missing messages and duplicate messages. Are they all prepeared to deal with that, which is a risk for any async data replication method?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Vitor
PostPosted: Tue May 12, 2009 5:22 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

PeterPotkay wrote:
Your backup QM could be quite far behind the production one, with the apps having to deal with missing messages and duplicate messages. Are they all prepeared to deal with that, which is a risk for any async data replication method?


The original post talked about DR rather than HA, which has an inbuilt assumption that the DR site is not bang (pun intended!) up to date.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue May 12, 2009 5:36 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

I know....my questions are still relevant.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
jhalstead
PostPosted: Tue May 12, 2009 7:00 am    Post subject: Reply with quote

Master

Joined: 16 Aug 2001
Posts: 258
Location: London

Data Centres 100+ miles apart, we're going to try for sync replication - however more likely that we'll need to go to async. Small amount of data loss is acceptable to client.

Not wanting to do MSCS to DR site.

We have a good fiber connection for SAN traffic, however a much lower bandwidth connection for standard WAN traffic.

Just for clarity - the QMID is stored in the Windows registry and not the queue manager data files?

Thanks for support so far (and quips Vitor!)
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Tue May 12, 2009 7:57 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

jhalstead wrote:
Small amount of data loss is acceptable to client.

So.

Sometimes there may be no messages in the queue when DR strikes, and so no loss of data.

Sometimes there may have been one message added to the queue, and a split second later disaster strikes. Async replication didn't get a chance to replicate it, so that message is lost.

Sometimes there may have been one THOUSAND message added to the queue under syncpoint, the commit is issued and a split second later disaster strikes. Async replication didn't get a chance to replicate it, so those 1000 messages are lost.

Your app is going to handle any of these scenarios, right? So if it can handle zero, one or a thosand lost messages, it can tolerate lost MQ message. Period.

A possible MQ DR solution is to have another QM in the DR datacenter manually created and kept in sync as far as the queues and channels definitions are concerned. The DR QM and the real QM have no direct connection or replication whatsoever. When DR strikes, you aim your apps at the DR QM and off they go, with clean queues to start with.

MQ is not a database.
MQ is not a database.
MQ is not a database.

Lets say you MQGET 1000 messages from a queue. A split second later disaster strikes. Async data replication never had a chance to replicate the fact that the MQGET completed. The DR queue has 1000 messages your app already processed. Not only does your app have to deal with the potential of missing messages, now it has to deal with the potential of duplicate messages.

Get away from async replication for MQ. Its a big problem unless the apps have designed for all the nuances, which I can guarantee you they have not. Your better off starting with clean queues on a queue manager, either because the DR QM is manually kept in sync with the same queue and channel definitions (DR QM name is different), or you do use Back Up QMs and your DR plan states that you will clear all the app queues before releasing the MQ DR QMs to the apps, maybe after saving off those messages to a file in case you do need them after the DR dust has settled.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
jhalstead
PostPosted: Tue May 12, 2009 2:36 pm    Post subject: Reply with quote

Master

Joined: 16 Aug 2001
Posts: 258
Location: London

Hey Peter, thanks for all your comments and wise advice.

I'm in this situation where the business application behavior and its response to different events is outside of my control and I think it unlikely that the vendor its going to deal perfectly with situations like lost messages. Rightly or wrongly it looks like the concept of assured delivery has been bought in to and if messages are put to queue trust is placed in WMQ and the apps are not going to be keeping track of what has been sent/received. Reconciliation that all the expected operations took place is not in the design. Again - we could debate if this is correct of not - however in some cases a true asynchronous publication of data in response to a business event seems like a valid operation...

Now I'm 100% behind using WMQ to transport messages and not using it as a database, I'm not proposing this. However I think I'm trying to be realistic. Just as WMQ might hit a problem and terminate - or need to be quiesced to perform administration we will need to perform administration / fix hard errors in the business applications that depend on WMQ. So what if we hit a disaster at one of these points in time? There were messages at rest on a queue, no doubt the odds of both events occurring at once are low - but it could happen.

I will take the investigation forward to clarify the behavior of the applications if duplicate messages are received.

In terms of the "hack" I've been considering, when I prototype this I get the DR queue manager up and running (having copied the log and data file system in from the disaster effected qmgr) I get a queue manager that runs, has the relevant queues, QMID, persistent messages from the disaster effected qmgr. Which is what I think I need.

I will explore the use of a backup queue manager further, I can't see anything in the System Admin Guide that suggests that this cannot be used on circular logging qmgrs, it does demand that it must "...have the same attributes as the existing queue manager, for example the queue manager name, the logging type, and the log file size....". However the following sections do read a bit like linear logging is involved.

Anyway - thanks to everyone who has contributed so far - its been very valuable.
Back to top
View user's profile Send private message Send e-mail
gbaddeley
PostPosted: Tue May 12, 2009 5:53 pm    Post subject: Reply with quote

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

Most DR planning documents state that current messages in MQ at the time of the disaster will not be available at the DR site, as they will be restoring from the most recent save of MQ state. "Recent" may mean last week, last night if backups are being used, or a few seconds ago if replication / mirroring is used.

The crux is that apps running on DR need to be able to deal with missing messages and duplicate messages. It may be safer to hand over the DR for app usage with no app messages queued, and leave it up to the apps to resolve the situation. It is likely to be quite complex, involving multiple local and remote apps, databases and transaction managers, so they may not want an uncertain state of MQ added to the equation.

HA is a different matter...
_________________
Glenn
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue May 12, 2009 6:44 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Quote:
The crux is that apps running on DR need to be able to deal with missing messages and duplicate messages.

A well-behaved app, DR or not, should be able to deal with missing or duplicated messages and replies.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
jhalstead
PostPosted: Wed May 13, 2009 12:24 am    Post subject: Reply with quote

Master

Joined: 16 Aug 2001
Posts: 258
Location: London

You are all right, a well designed application should deal with all these obvious situations. Its just a shame this is largely not the case in reality.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » WMQ on Windows and disaster recovery
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.