Author |
Message
|
vennela |
Posted: Thu Jan 16, 2003 10:50 am Post subject: Workflow monitoring |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
What is the best way to monitor Workflow:
This is my plan.
1. check if fmcamain process is there
2. use fmcautil to connect to the workflow server
--- make sure the server is active and is responding
3. sleep for 2 mins and goto 1.
What else can I do other than this?
What happens when the admin server is not shut down by the ADMIN but admin server crashes for some reason? Is the fmcamain process still out there or is it cleaned up.
Are there any tools out there that does similar monitoring?
---
Venny |
|
Back to top |
|
 |
manoj |
Posted: Thu Jan 16, 2003 11:06 am Post subject: |
|
|
 Master
Joined: 30 Jan 2002 Posts: 237 Location: Virgina
|
U should look for tools like BMC ...It can monitor processes( unix ) logs(can look for specific strings in the log) and notify people (pagers, cell phone etc) _________________ -manoj |
|
Back to top |
|
 |
Ratan |
Posted: Thu Jan 16, 2003 11:26 am Post subject: |
|
|
 Grand Master
Joined: 18 Jul 2002 Posts: 1245
|
fmcamain is the admin server. So if the admin server crashes that means fmcamain would not be responding. I have no Idea how to find if a process is hanged. May be OS people can help you.
If your admin Server crashes your processes continue to run untill you have 'fmcemain'(execution Server) alive. Admin Server maintains the basic health and welfare of your Workflow System. When 'fmcemain' crashes admin server will try restarting it. I am not sure what happens after multiple retries by admin server to restart an execution server.
So the processes you should monitor would be fmcamain.exe and also fmcemain.exe.
-Laze. |
|
Back to top |
|
 |
yaakovd |
Posted: Mon Jan 20, 2003 9:04 am Post subject: |
|
|
Partisan
Joined: 20 Jan 2003 Posts: 319 Location: Israel
|
fmcamain is an ultimate indicator for any problem (DB, cluster etc...), because we have cases when server crushed without any information in log files.
But you have also minitore on execution server and DB2 (or other DB)processes to make full picture of problem. _________________ Best regards.
Yaakov
SWG, IBM Commerce, Israel |
|
Back to top |
|
 |
vennela |
Posted: Mon Jan 20, 2003 9:39 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Is it possible that a fmcemain process is running (or hanging) but fmcamain process is not?
---
Venny |
|
Back to top |
|
 |
Ratan |
Posted: Mon Jan 20, 2003 9:58 am Post subject: |
|
|
 Grand Master
Joined: 18 Jul 2002 Posts: 1245
|
Yes, fmcemain continues to run after your fmcamain crashes. This behavior is also documented in some manual (dont remeber which).
-Laze |
|
Back to top |
|
 |
yaakovd |
Posted: Mon Jan 20, 2003 9:59 am Post subject: |
|
|
Partisan
Joined: 20 Jan 2003 Posts: 319 Location: Israel
|
Yes - right now I killed my fmcamain (on UNIX) and fmcemain still running. It also happaned when you have cluster problems _________________ Best regards.
Yaakov
SWG, IBM Commerce, Israel |
|
Back to top |
|
 |
vennela |
Posted: Mon Jan 20, 2003 10:46 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Oh OK:
Then I should check for both fmcamain and fmcemains I guess.
Were you able to restart fmcamain after killing just fmcamain and was everything fine?
I too am interested in tests on UNIX.
Thanks
Venny |
|
Back to top |
|
 |
vennela |
Posted: Mon Jan 20, 2003 12:23 pm Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Looks like the sooner we bring back the admin server the better. I found the following from the paper by Wolfgang Kulhanek at
http://www-3.ibm.com/software/ts/mqseries/txppacs/wd02.html
(the same paper that also talked about the runtime database maintenance).
Quote: |
The Administration Server is the one MQSeries Workflow operating system process that needs to be monitored. If this process should go away then a restart of the MQSeries Workflow system (not the operating system!) is inevitable. This restart could be deferred if the workflow system is running completely automatic business processes – or if all users that participate in a business process are already logged in. The reason for this is that once a user is logged in to the workflow system, all communication happens directly with the Execution servers. If however a new user would want to log in and the Administration server is not running the logon would fail. |
I found this in the "Processes to monitor" chapter.
---
Venny |
|
Back to top |
|
 |
yaakovd |
Posted: Mon Jan 20, 2003 2:18 pm Post subject: |
|
|
Partisan
Joined: 20 Jan 2003 Posts: 319 Location: Israel
|
fmcamain process was "reincarnated" without problem and without restarting of any other processes. _________________ Best regards.
Yaakov
SWG, IBM Commerce, Israel |
|
Back to top |
|
 |
jmac |
Posted: Mon Jan 20, 2003 3:58 pm Post subject: |
|
|
 Jedi Knight
Joined: 27 Jun 2001 Posts: 3081 Location: EmeriCon, LLC
|
fmcamain's main purpose in life is to
1) make sure that all of the other servers are running. It does this by checking for the fmcemains, fmcsmain (if any) and fmccmain (if any) every so often the so called "heartbeat" interval. By default this is 5 minutes.
2) Authenticate users when they logon.
So if fmcamain dies (note that I've rarely seen this happen) you would most likely become aware of this when users complained about not being able to logon. The way you should realize that fmcamain is down, is that some users are merrily working away, and others are complaining that they cant logon.
The other exposure is that when fmcamain is dead, it can not detect if any other server were to die. So if your fmcemains died, then you would start to hear everyone complaining. _________________ John McDonald
RETIRED |
|
Back to top |
|
 |
nwhi |
Posted: Tue Jan 21, 2003 12:22 am Post subject: |
|
|
Apprentice
Joined: 19 Dec 2002 Posts: 25 Location: UK
|
We've seen fmcamain die several times on AIX. Our workflows are completely automatic but despite Wolfgang's statement the execution servers do stop processing requests (but not so they disappear if you do a 'ps -ef').
My theory is that, sooner or later, the execution servers try to write something to the logs (also one of fmcamain's responsibilities) and then get stuck. However, it's all MQ messaging so I'm aware the theory doesn't hang together
PS this is mainly in test environments, where there are occassional writes to the WF logs. We have actually had Workflow running in production for months without a restart (though not recommended), so it can be robust. _________________ Nick Whittle
IBM Certified Solutions Designer -
WebSphere MQ Workflow V3.4
MQSolutions (UK) Ltd |
|
Back to top |
|
 |
jmac |
Posted: Tue Jan 21, 2003 3:29 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Jun 2001 Posts: 3081 Location: EmeriCon, LLC
|
Nick:
I just read something about the RTRecycle_Threshold Parameter. It was recommended to set this to a non zero value, to have the Exec Servers recycle themselves in a 24x7 environment. You may already be using this, but it was something new to me. _________________ John McDonald
RETIRED |
|
Back to top |
|
 |
manoj |
Posted: Tue Jan 21, 2003 10:05 am Post subject: |
|
|
 Master
Joined: 30 Jan 2002 Posts: 237 Location: Virgina
|
It's quite possible that an execution server instance stopped processing. But MQWorkflow restart this instance quickly . You can observe the what's going on behind by using the "fmcautil" admin utility. execute this utility and "do not" close the console opened by the utility .MQWorkflow will push all the server related messages to this console. Monitor these messages for a length of time (say 3 hrs) on a lively system. You will find many useful things from these messages. _________________ -manoj |
|
Back to top |
|
 |
mkhadse |
Posted: Thu Mar 06, 2003 8:49 am Post subject: |
|
|
Acolyte
Joined: 31 Dec 1969 Posts: 73
|
We had the workflow processes running fine when fmcamain was down for 2 days. The fmcamain process used to die as soon as anybody tried to login.
BTW, we are monitoring the fmcamain, fmcemain, db2 processes and Queue monitoring on Workflow queue manager.
This should detect all the possible probelms. Anybody has any different ideas? |
|
Back to top |
|
 |
|