MQSeries.net :: View topic - Automating MQ troubleshooting

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Automating MQ troubleshooting

Goto page 1, 2 Next

Automating MQ troubleshooting

« View previous topic :: View next topic »

Author

Message

jeevan

Posted: Thu Jan 27, 2011 12:25 pm Post subject: Automating MQ troubleshooting

Grand Master

Joined: 12 Nov 2005
Posts: 1432

Hi folks outhere,

I am working on MQ troubleshooting script. I am doing this:

check qmgr is runing ( ok)
check listener is running (ok)
check command server is running if not, if hung stop and start etc

Now, or earlier when I check the qmgr is running, like to check qmgr is responsive. I can use ping qmgr commaind. but my problem IS this.

If I run this command on main flow, if the qmgr is hung or non responsive, my script stocks. I can put that single command in another file and call it but still if this runs as a normal script, the main flow still waits for it to complete. I have to make it running in background so the amin flows procgresses and I can check whether the script is still running and determine whether the ping hung or not. But I could not find a way to do this in perl or batch script.

I tried this:

main script looks something like this.

#!/usr/bin/perl
#require "count.pl";
#system("perl c:/temp1000/count.pl >/dev/null 2>&1 &");
#system("perl c:/temp1000/count.pl &");
my $test= `perl c:/temp1000/count.pl &"`;

print " PID of child: $test\n";
print "Finished Counting!";

Whatever way ( either of 3 ways shown above), I call the child script, the main script waits for it to complete. Is there anyway I can achieve it?

Further information:

MQ version : MQ 7.0.1.3
perl dist :Active perl 5.10
platform : Windows

Note: I think I can achive this in Unix/linux easily by runing the script in background and checking whether it completed or not.l

Thanks

Last edited by jeevan on Thu Jan 27, 2011 1:59 pm; edited 3 times in total

fatherjack

Posted: Thu Jan 27, 2011 12:42 pm Post subject:

Knight

Joined: 14 Apr 2010
Posts: 522
Location: Craggy Island

Or you could look at any of the many MQ systems management tools out there.
_________________
Never let the facts get in the way of a good theory.

gbaddeley

Posted: Thu Jan 27, 2011 4:17 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

On Windows, you can examine the output of the the "tasklist" command to check that the MQ processes and service is running. This is pretty low level stuff. It would be much better and more robust to use a proper MQ monitoring tool. Search this forum.
_________________
Glenn

jeevan

Posted: Thu Jan 27, 2011 8:06 pm Post subject:

Grand Master

Joined: 12 Nov 2005
Posts: 1432

gbaddeley wrote:

I am trying to develop troubleshooting script, which does all basic test not a monitoring. we have monitoring in place.

bruce2359

Posted: Thu Jan 27, 2011 10:17 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Hmmmm, interesting.

Monitoring identifies potential problems, while troubleshooting discovers the cause(s) of problems. Monitoring is observational, while troubleshooting is analytical.

What do you expect your script to discover that monitoring tools will not?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

santnmq

Posted: Fri Jan 28, 2011 3:27 am Post subject:

Centurion

Joined: 11 Jan 2011
Posts: 125

It's confusing, You probably want to monitor MQ through script not troubleshoot.

It's easy to write scripts for your constomized monitoring and get alerted whenever there is an issue OR as stated earlier there are number of monitoring tools which can be used for this purpose.

jeevan

Posted: Fri Jan 28, 2011 4:50 am Post subject:

Grand Master

Joined: 12 Nov 2005
Posts: 1432

santnmq wrote:

why you guess probably I want monitoring scirpt ? We have ITM and Nagios monitoring tool in place. I want and developing troubleshooting script.

Vitor

Posted: Fri Jan 28, 2011 5:15 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jeevan wrote:

We have ITM and Nagios monitoring tool in place. I want and developing troubleshooting script.

The question I'm struggling with here is the distinction you're making. To me, a monitoring tool monitors a software system (WMQ, database, application, whatever) and alerts in the event of a problem. That problem could be a response out of SLA but is more typically a failed component like a downed queue manager or a channel in retry. This is certainly what ITM does.

So given that, what additional information are you attempting to determine with this script? How will it help you troubleshoot in a way that the existing monitoring doesn't?
_________________
Honesty is the best policy.
Insanity is the best defence.

mqjeff

Posted: Fri Jan 28, 2011 5:19 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

I suspect that jeevan is attempting to write something similar to the functionality provided with ISALite - where it will automatically assemble a predefined and fixed set of troubleshooting data at the current point in time - like saving out a copy of AMQERR logs and collecting FFST files (and maybe running ffstsummary) and etc. etc. etc.

So that *after* the monitoring tool has CREATED an alert, then something can be run to make a first pass at DIAGNOSING the issue.

It's not a bad idea, really.

In terms of the actual *problem* mentioned - how to fork a process that might potentially hang and then detect if it has or has not actually HUNG.... That's by definition Turing Complete in the general case... In this case, I'd redirect the stdout of the command back into the code and use a read or seek that times out. If I haven't gotten any data after a timeout, I'd assume that the command had hung, and kill it.

jeevan

Posted: Fri Jan 28, 2011 6:01 am Post subject:

Grand Master

Joined: 12 Nov 2005
Posts: 1432

mqjeff wrote:

mqjeff ,

You are almost correct in first part. I want to develop a script which does two things: collect stuff as ISAlite does and does basics health check and recommand next step( saves time, typing, missing to check some thing etc) in crucial situation)

Let me explain the purpose. Let's say we have reported a problem in our qmgr/MQ app or broker flows, that message are not being processed or can not put message, or database are not being updated etc) or is an trouble ticket, what do we do? How do we process?

We jump in to the server, check qmgr status, let's say it is in non responsive( one of the many possible situation) state. We start it( our priority is to restore the service asap).

We check a few stuff. If we figure out the cause good but if not, if it is a late night, we think, I will collect qmgr log etc, and go to sleep.

Tomorrow, we may will spend sometime or it or forget.
Even if we spend time, and lets say we could not fingure out, we may approach IBM.

By, then,

The qmgr log would have been overwritten,
The contect ( process running, memory, space etc at the time of crash) would have changed.

IBM may ask, " can you recreate the scenario?" probablly not as we might have lost all state of the os, mq, qmgr , app when it crashes.

This said, let me come back to the purpose of my script:
I want to do things we do when we log in to troubleshoot. Thus, it helps to troubleshoot faster, and collect and the must gather for later usage:

for ecample,

check qmgr status eg
dspmq
if it is running, check it is responding( ping qmgr, where I can stock now)
if qmgr is responding, then check whether the listener is running
check, command server running

check the memory, disck space, space in /tmp directory ( in some cases it coud be a problem)

scan qmgr log to see if there are any uncommon AMQ error
check to see there are FDc generated

In both of the above cases, copy them to a temp location ( not anywhere even create a fixed place so we do not forget later on)

Again, i just want to be more systematic ( not missing anyting to chec) by follwoing a check list ( best practice)

Save time in crucial situation
Help junior administrator/engineer do basic things withoug asking senior people
create a knowledge ( If I put this as a script and develop the documentation or explain each step, a knowledge is created(DIKW).

Also, my way is, I basically develp a script if I have to do same thing or simialr thing again and again. That is what really means an automation or adminstrator or support.

In fact, I am still coming up with document( requrement ) and flow. Now both are in my mind.

jeevan

Posted: Fri Jan 28, 2011 6:20 am Post subject:

Grand Master

Joined: 12 Nov 2005
Posts: 1432

Vitor wrote:

jeevan wrote:

We have ITM and Nagios monitoring tool in place. I want and developing troubleshooting script.

Vitor,
I see an alert is a warning that a threshold has been reached and troubleshooting is an art that figures out why it happend.

The first is done by the tool you mentioned and which is already automated and I am not rying to reinventing the wheel, but I want to achieve the second more systematic way.

Lets say, even if it is a response out of SLA, we do check certain things. In fact, I want to follow a checklist in scripted way to do this check and come out with a consistent result.

If it is a trouble situation, we definitely check certain thhings. I wiould like to check these and, and based on our knowledge recommend next step.

I would say it is a coding knowledge and implementing the knoeldge.

This helps save crucial informaiton for later usages ( eg qmgr log , FDC etc) systematic way

Tt helps figuring out the problem faster and correct manner ( we may miss to check someting while doing remembering from top of our head)

Hope I answer your question. I do apprecaite more questions so that i can come up with better script or give up as it may not require or useful.

Vitor

Posted: Fri Jan 28, 2011 6:53 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jeevan wrote:

I see an alert is a warning that a threshold has been reached and troubleshooting is an art that figures out why it happend.

The first is done by the tool you mentioned and which is already automated and I am not rying to reinventing the wheel, but I want to achieve the second more systematic way.

That's a fair point. I remain unconvinced that a script could determine anything that the existing monitoring tool has not already recorded, but do see that the checklist concept has value & maybe could be scripted as you indicate.
_________________
Honesty is the best policy.
Insanity is the best defence.

mqjeff

Posted: Fri Jan 28, 2011 6:54 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

The part where you automatically gather data is, as I said, not a bad idea. In fact, it's a really good idea.

The challenge is to build enough of an expert system on top of that that you can actually make any recommendations about what to do next. Given, for example, a channel ending with an MQRC2009 - is it the fault of the network? Is it the fault of the qmgr? Is it the fault of the network hardware in the server? How do you determine what is the "next" step in this case? Using what information?

This gets to be a complicated, hard problem.

It's worth examining at least, but it gets complicated and hard very quickly.

bruce2359

Posted: Fri Jan 28, 2011 6:56 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Before you begin coding a troubleshooting script, document in flowchart fashion exactly what event/situation (symptom) will lead to which steps in the troubleshooting process. Do this as if you were training your replacement to diagnose suspected problems.

An example symptom: a queue that has no messages (current depth) - but should. There are a variety of possible causes for this. The underlying cause (problem) may be very different from another queue with the same symptom.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

jeevan

Posted: Fri Jan 28, 2011 7:02 am Post subject:

Grand Master

Joined: 12 Nov 2005
Posts: 1432

mqjeff wrote:

I agree. But I want to come out it is not a mq problem, it is network problem. etc. so I can reach out to network folks and /or create a trouble ticket.

I am not trying to go beyond my own area.

Thank you very much for the insight. I am pretty much sure, I have to witness a lot of hurdle in this journey.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » General IBM MQ Support » Automating MQ troubleshooting

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP