Author |
Message
|
rjlfc |
Posted: Wed Apr 14, 2010 8:06 am Post subject: MQ/Queue Manager Backup and Recovery |
|
|
Apprentice
Joined: 04 Apr 2008 Posts: 31
|
Hi all,
So, as you can see from the title - 'that old chestnut'.....
I've just read info re. this on the System Admin guide on the InfoCenter as well as searching through pages of threads in these forums. Certainly seems to provoke lots of discussion re. Linear vs Circular, etc. etc.!
Anyway, hopefully(!) this is a slightly more straightforward query....but then maybe not.
We run 2 Queue Managers on MQ v7 on an MSCS - 2 pysical servers, MQv7 installed on each, queue managers created then used the supplied MQ commands to move the log and data directories to a shared (SAN) disk).
MQ messages are fairly transient and don't build up often (if at all) and loss of messages can be coped with. Therefore we use Circular logging.
I'm fairly comfortable with this resilience, but still want to put in some measure of recovery.
So, I'm going to run saveqmgr so I have all the objects (not many) in a recoverable file, not just defined in documentation.
Having read the InfoCenter I am still a little unclear of whether taking a backup of the data and logs directory on a regular basis is worthwhile in addition to the above, and if so would I need to backup the MQ aspects of the Windows registry?
E.g. if we had some data corruption that caused the queue manager to not start (regardless of the node thats active in the cluster), would I be better off simply deleting the QM, recreating (and all that entails in the MSCS world) then re-applying the saveqmgr backup, or should my first step always be to attempt to re-apply the log/data directory (and registry?) from backup?
Many thanks,
Rich |
|
Back to top |
|
 |
Vitor |
Posted: Wed Apr 14, 2010 8:37 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
If you're using WMQv7 consider using the new multi-instance queue managers. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Apr 14, 2010 9:22 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
... if we had some data corruption ... |
Some data corruption? Wouldn't you consider any data corruption an indicator that you might have more data corruption than you have thus far discovered?
How to recover? It depends. There is a sliding-scale of outages.
You need to ponder outages from everything is lost to something is lost; and have multiple backup/recovery methodologies.
If everything is lost (city, building, server farm, server), then you might need to reinstall MQ software, create a new qmgr, restore object definitions from backup, ...
If just something (like object definition, messages in a queue), then lesser recovery is called for.
As discussed here many, many times, copying/restoring portions of the file-system and/or registry, is a hopeful attempt at recovery; and depends on how current your backups are; and whether recovery to a point-in-time for one lost message back-levels other messages in other queues.
There is no simple solution for sysadmins for backing up everything, and ensuring that we can restore something from the everything we backed up.
The best solution (my opinion) is to have applications create backups of
application data with application programs; and to have app programs restore to an application-determined point-in-time. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Apr 14, 2010 12:44 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Point in time backups of MQ are useless. First, you have to stop the Queue Manager to get a good backup. Second, as soon as you take the backup its probably obsolete.
Backup at 12:00:00
Crash at 12:30:00
Restore to 12:00:00
Spend the next few days dealing with apps that want to know where are all the messages from 12:00:01 thru 12:30:00, while dealing with apps that are complaining why all the messages they processed at 12:00:01 are now back on the queues, etc.
Use MQ to move data. Use a database to store (and restore) data. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Apr 14, 2010 1:14 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
Point in time backups of MQ are useless |
Of course. I agree completely. Unless it works for the business to go back to yesterday or the day before point-in-time. In a true disaster all things are possible.
MQ is not a database.
MQ is not a place for long-term data storage. Long-term (to me) means messages that are not consumed as soon as they are put to queues.
The longer a message resides in a queue, the more at risk for loss.
Know your apps. Know which apps leave messages in queues - for consumption later (like in batch, at midnight). Have app developers develop an application-oriented backup of queues as seems appropriate to the developers. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
gbaddeley |
Posted: Wed Apr 14, 2010 4:39 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
PeterPotkay wrote: |
Point in time backups of MQ are useless. First, you have to stop the Queue Manager to get a good backup. Second, as soon as you take the backup its probably obsolete.
Backup at 12:00:00
Crash at 12:30:00
Restore to 12:00:00
Spend the next few days dealing with apps that want to know where are all the messages from 12:00:01 thru 12:30:00, while dealing with apps that are complaining why all the messages they processed at 12:00:01 are now back on the queues, etc.
Use MQ to move data. Use a database to store (and restore) data. |
Exactly, that's why Disaster Recovery plans should state that MQ restoration at a DR site DOES NOT include restoration of queued application messages. This actually helps app recovery because they don't need to worry about processing duplicate MQ messages, in addition to their other woes with reconciling databases and transactions. _________________ Glenn |
|
Back to top |
|
 |
rjlfc |
Posted: Thu Apr 15, 2010 12:09 am Post subject: |
|
|
Apprentice
Joined: 04 Apr 2008 Posts: 31
|
Thanks Vitor. Will look at multi-instance queue managers.
thanks for all the other responses. However, I think some of them deviated slightly off topic from my particular question.
I appreciate there is a sliding scale of failure and subsequent recovery. As you say, some or all of the following - re-installation of MQ, recreation of the queue manager, and recreation of objects from the saveqmgr backup are possible.
However, my specific question is, if we had a problem starting a queue manager (which I appreciate could be for many reasons) and it looked liked some corruption of the queue manager data, do you guys feel there is any worth in even attempting to first restore the logs/data directory from backup before attempting to recreate the queue manager? Point in time backup is not a requirement for us. As stated messages are very transient and we could recover from lost messages (with a bit of manual effort) from an app perspective. So stopping the QM, backing up the logs and data directory (when no messages are present) would give us a snapshot of the queue manager and objects in a working state. The question is, does the restore of these directories (and registry?) work that often (in your experience). If the answer is rarely, I doubt we'll even bother trying to back them up and restore them and just jump straight into attempting to recreate the QM. the main reason for the question is trying to shorten the potential downtime. As we are on MSCS, to recreate the QM we'd have to go through the process of creating then splitting logs and data directory all over again etc. If we could (potentially) save some time by restoring the backed up QM data and logs directory then all the better.
Many thanks. |
|
Back to top |
|
 |
exerk |
Posted: Thu Apr 15, 2010 12:28 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
rjlfc wrote: |
...Thanks for all the other responses. However, I think some of them deviated slightly off topic from my particular question... |
But still should be considered - it's a 'wide' topic as far as I am concerned...
rjlfc wrote: |
...As stated messages are very transient and we could recover from lost messages (with a bit of manual effort) from an app perspective.... |
But what about duplicates? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
rjlfc |
Posted: Thu Apr 15, 2010 1:00 am Post subject: |
|
|
Apprentice
Joined: 04 Apr 2008 Posts: 31
|
exerk wrote: |
rjlfc wrote: |
...Thanks for all the other responses. However, I think some of them deviated slightly off topic from my particular question... |
But still should be considered - it's a 'wide' topic as far as I am concerned...
rjlfc wrote: |
...As stated messages are very transient and we could recover from lost messages (with a bit of manual effort) from an app perspective.... |
But what about duplicates? |
Agree it is a wide topic and defnitely considering some of the points made which were very useful.
As per my post, the backup of the QM data and logs would be done while there was nothing on the queues - so we could not have any duplicates. |
|
Back to top |
|
 |
exerk |
Posted: Thu Apr 15, 2010 1:48 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
rjlfc wrote: |
...As per my post, the backup of the QM data and logs would be done while there was nothing on the queues - so we could not have any duplicates. |
Then as far as I am concerned there is very little point in backing it up. Most likely the restore would take as long as a rebuild, and if the restore doesn't work then you have additional time added to your recovery. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
rjlfc |
Posted: Thu Apr 15, 2010 3:04 am Post subject: |
|
|
Apprentice
Joined: 04 Apr 2008 Posts: 31
|
I think this is where I am missing something or misunderstanding due to my limited knowledge. Wouldn't the restore (are we talking about restoring the QM data and log files) be a 'straigthforward' (ducks for cover!) case of replacing the data and log folder and file structure with the backed up one then trying to restart the queue manager, or is the restore more complicated that simply deleting folders and copying over the backed up ones?
If that is the process, surely it is much quicker to do that than rebuild the queue manager (on our MSCS environment), of course assuming the above restore process works and the QM can be started, which is the basis of my question in the first place.
thanks |
|
Back to top |
|
 |
exerk |
Posted: Thu Apr 15, 2010 3:41 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
The question is one of whether it is a straightforward folder/file delete and a copy back, or a true controlled back-up-and-restore mechanism, e.g. TSM. If the former, then yes it is probably quicker, but if the latter it may be considerably quicker to take it out of MSCS control, delete/redefine and put it back under MSCS control - horses for courses I feel. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
rjlfc |
Posted: Thu Apr 15, 2010 4:43 am Post subject: |
|
|
Apprentice
Joined: 04 Apr 2008 Posts: 31
|
Great, thanks, makes sense. We've got full automated backups run by the infrastructure guys, but I think I would prefer in the first instance to try and restore via the file system method having taking a copy of the QM folders when offline.
So my next questions/queries are -
1. Are the QM data and logs folders/files the same ones that get moved to shared directories as part of the MSCS process, i.e.
\log\qmgrname\....
\Qmgrs\qmgrname\....
2. Re. the registry entries for MQ - are they required? If so is it everything under MQSeries or everything under the queue manager(s) structure that should be backed up?
Thanks. |
|
Back to top |
|
 |
exerk |
Posted: Thu Apr 15, 2010 5:17 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
From (long) memory I believe the hamvmqm command moves the folders and updates the Registry and other 'interested' processes with the new path, so I would hazard that backing up the contents of the MSCS-controlled queue manager data and logs would be sufficient. Personally, I'd still want to test it. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Apr 15, 2010 6:04 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Read up in the admin manual etc...
If you have a problem with media recovery you have 2 possible strategies:
a) the Queues are damaged => you need linear logging for recovery
b) The logs are damaged and qmgr does not restart => you can recreate a qmgr with the same logging parms and copy the logs and log cntrl file. Remember to make a record of the media image right after the qmgr comes back up.
Of course a raid with mirrored disk image will help too. _________________ MQ & Broker admin |
|
Back to top |
|
 |
|