Author |
Message
|
Challenger |
Posted: Wed Jun 04, 2008 2:56 am Post subject: Challenge Question - 06 / 2008 |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
June's Challenge deals with a real production issue faced by one of our regularly participating members whose name shall remain a secret until the winning response has been selected.
Presenting problem:
On a Unix platform: you are using linear logging and the file system where the log files reside are full as an MQ internal process is constantly writing MQ_RESOURCE_PROBLEM errors to AMQERR01.LOG .
But, when you run the "standard" Perl script (from the support pac MS62) to clean up the logs it does nothing even though there are log files that should be archived and/or deleted.
Why, and How do you fix it?
Good Luck |
|
Back to top |
|
 |
bbburson |
Posted: Wed Jun 04, 2008 7:10 am Post subject: |
|
|
Partisan
Joined: 06 Jan 2004 Posts: 378 Location: Nowhere near a queue manager
|
Why? Because the resource problem error messages are being posted so fast into AMQERR0n.LOG that there are none of the usual AMQ7467 and AMQ7468 messages available to let the MS62 script know which log files are safe to remove.
How to deal with it? Use runmqsc commands "DIS QMSTATUS RECLOG" and "DIS QMSTATUS MEDIALOG" to find the oldest logs needed by the queue manager. Then archive/delete files that are no longer required.
Of course if you're not at version 6 this won't work and you're out of luck. |
|
Back to top |
|
 |
Challenger |
Posted: Wed Jun 04, 2008 9:28 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
This actually happened long ago and far way on a 5.3 queue manager - but it was still possible to get ms62 to work - any ideas how? (I know it's not supported, but there are still 5.3 sites out there) |
|
Back to top |
|
 |
Nigelg |
Posted: Wed Jun 04, 2008 1:43 pm Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
tail -f AMQERR01.LOG > e1
wait...
check for AMQ7467/8 in e1
repeat until the msgs are in e1
then use e1 as input to MS62.
Perl is rather twee.
Real UNIX programmers use awk. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
bbburson |
Posted: Wed Jun 04, 2008 2:02 pm Post subject: |
|
|
Partisan
Joined: 06 Jan 2004 Posts: 378 Location: Nowhere near a queue manager
|
MS62 already looks in all three AMQERR0n.LOG files. If the messages aren't found there, doing a tail -f on the one will not likely gain you anything. Tail does not follow the file renames that happen when the files get too big. After a short time you'll be stuck tailing a file that is no longer being written to. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jun 04, 2008 3:42 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
You need to isolate the qmgr and stop all its channels and users. Running rcdmqimg -l will give you the latest needed logs as well as create a new restart point. Archive files no longer needed.
Restart channels and applications...
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Nigelg |
Posted: Wed Jun 04, 2008 9:28 pm Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
bburson:
I thought the idea was to come up with a solution of your own, not criticise other people's.
While we are doing that, you are wrong. The msg is witten to the file, but is removed from the file before MS62 can find it. tail -f wil dump the contents of the file as they are written.
Additionally, I also suggested that this procedure could be repeated as necessary until the msgs are found, not just run once. Of course tail does not follow file renames; once there is no more output the tail can be abandoned and restarted.
Keep your negative (and wrong) comments to yourself. If you have anything positive to say, or any answer to the challenge, post it, otherwise shut up. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
Challenger |
Posted: Thu Jun 05, 2008 12:57 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
OK, well let's keep it friendly
There is a problem on tail -f with an AMQERR01.LOG, as MQ renames the file to 02 when it fills up - but "tail"s file handle doesn't follow it.
Another issue is that these AMQ74xx messages are written to the log when 1000 transactions (GETs or PUTs) are done - but if the log file directory is full then you won't be doing anymore PUTs and if MQ is well-trashed then it's unlikely you'll be doing much GETting either |
|
Back to top |
|
 |
Nigelg |
Posted: Thu Jun 05, 2008 1:44 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
Hey, let's all rubbish my 10-second thought.
Anybody else want to put their five eggs in?
Let me put it in words of one syllable.
If it does not find the log msgs the first time, have another go until it does. There is no need to leave it running when there is no more input.
It is 10000, not 1000, and it is not puts and gets, but rather coyly named 'log operations'.
It is a complete waste of time trying to determine what logs to archive if the log directory is full anyway. The question is nonsensical. It is like asking how to change a light bulb in a cabin on the Titanic. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
Challenger |
Posted: Thu Jun 05, 2008 3:12 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
OK, OK, well, "log operations" mainly means GETs and PUTS of persistent messages and yes it's 10000 not 1000 - but the issue is not to archive the old log files (who cares - throw them away) but to be able to get a snapshot of all active messages in the smallest number of log files, to save space - so you HAVE to know what is the oldest file needed for media recovery, and delete anything older - otherwise you risk losing your backup if you do need to replay the messages. Without knowing this and blindly deleting logfiles means there is a risk of a) unrecoverable queues or worse b) unstartable queue manager |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jun 05, 2008 3:00 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Again quiesce the qmgr while running:
a) stop all channels (with force if need be)
b) stop all apps connecting in bindings
c) expand enough space to run rcdmqimg -l (this will give you the new values for the restart and media logs (write them down))
d) archive unused logs to a different file volume (first manually then using the support pack) freeing up even more space
e) restart the channels
f) restart apps with bindings connection
g) schedule rcdmqimg to run on a schedule or at least once a day.
h) schedule the support pack to archive the logs after rcdmqimg has run.
And yes there is a good reason to force the business to a short outage: If we don't take the outage, the qmgr is going to crash...
Hope this helps.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Challenger |
Posted: Fri Jun 06, 2008 1:57 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
Sound advice. The queue manager is pretty much unusable so an outage should not be an issue - it is a crash situation.
For extra points could anybody suggest what to do if a log file WAS accidently deleted, with no backup? |
|
Back to top |
|
 |
Gaya3 |
Posted: Fri Jun 06, 2008 2:38 am Post subject: |
|
|
 Jedi
Joined: 12 Sep 2006 Posts: 2493 Location: Boston, US
|
cold start procedure is the best way...if this type issue faces
Regards
Gayathri _________________ Regards
Gayathri
-----------------------------------------------
Do Something Before you Die |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Jun 06, 2008 3:07 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Challenger wrote: |
Sound advice. The queue manager is pretty much unusable so an outage should not be an issue - it is a crash situation.
For extra points could anybody suggest what to do if a log file WAS accidently deleted, with no backup? |
If you haven't stopped the qmgr and you are missing log files the first rule would be don't stop the qmgr.
If you are running with linear logging, just try and run rcdmqimg,,, you might get lucky and it might create enough of the new files to allow you to stop the qmgr.
Otherwise just keep it (the qmgr) running until the needed logfiles have been recreated...
If you already stopped the qmgr... the cold start method seems to be the only means as Gaya3 stated. Make sure you run rcdmqimg right after restart if the qmgr is in linear logging. This should recreate the logs including the persistent messages as it creates the checkpoint.
Enjoy
These options are by no means finite. They are just the most obvious ones.
Please add your own experiences of crashes and how you solved it here. _________________ MQ & Broker admin |
|
Back to top |
|
 |
Challenger |
Posted: Sat Jun 07, 2008 3:06 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
Quote: |
These options are by no means finite. They are just the most obvious ones.
Please add your own experiences of crashes and how you solved it here. |
Yes please, it would be extremely interesting to see what problems other people have experienced, and how they (hopefully) fixed the situation. As I said, this challenge is based on something that actually happened, so the more the merrier. |
|
Back to top |
|
 |
|