Author |
Message
|
vinodsasidharan |
Posted: Thu Feb 19, 2004 12:09 am Post subject: !!!!!!!!!!!!!!!!!!!Execution server dies continuosly |
|
|
 Apprentice
Joined: 25 Apr 2003 Posts: 47 Location: Norwich
|
Hi,
I have workflow 3.3.0 running in aix 4.3.3 with mq series of 5.2
My issue is that , i have 4 execution servers running and all these stops and gets restarted automatically every 5 minutes continuosly .
I have the following errors in the fmcsys.log and fmcerr.log everytime it is down and restarted
FMCSYS.LOG
***********************************************************
***********************************************************
02/19/04 13:18:16 FmcIntExcInvalidBranch02/19/04 13:18:16 FMC31050E An error has
occurred which has terminated processing.
02/19/04 13:18:16 FmcIntExcInvalidBranch02/19/04 13:18:16 FMC31050E An error has
occurred which has terminated processing.
02/19/04 13:18:16 FmcIntExcInvalidBranch02/19/04 13:18:16 FMC31050E An error has
occurred which has terminated processing.
02/19/04 13:18:16 FmcIntExcInvalidBranch02/19/04 13:18:16 FMC31050E An error has
occurred which has terminated processing.
02/19/04 13:18:16 FmcIntExcInvalidBranch02/19/04 13:18:16 FMC12240E Execution se
rver instance(s) stopped with an error.
02/19/04 13:18:16 FMC12240E Execution server instance(s) stopped with an error.
02/19/04 13:18:16 FMC12240E Execution server instance(s) stopped with an error.
02/19/04 13:18:16 FMC12240E Execution server instance(s) stopped with an error.
02/19/04 13:18:16 FMC12240E Execution server instance(s) stopped with an error.
02/19/04 13:18:17 FMC10500I Execution server instance started.
02/19/04 13:18:17 FMC10500I Execution server instance started.
02/19/04 13:18:17 FMC10500I Execution server instance started.
02/19/04 13:18:18 FMC10500I Execution server instance started.
02/19/04 13:18:19 FMC10500I Execution server instance started.
FMCERR.LOG
************************************************************
************************************************************
MQSeries Workflow 3.3 Error Report
Report creation = 02/19/04 13:10:41
Related message = FMC31050E An error has occurred which has terminated processin
g.
Error location = File=/projects/fmc/drvs/lbld/v330/src/contact admin.cxx, Line=847,
Function=FmcActivityState::Checkout(FmcCoreContext&,FmcProgramActivity&,const Fm
cUser&) const
Error data = FmcIntExcInvalidBranch
************************************************************
************************************************************
We had a new enhancement in workflow and so on feb 7 the new fdl and code went live .
1) In the new fdl we had some extra control paths and new decisions and users .
2) The old process instances (Before enhancement) do not have these flows and new decisions .
3)So i feel due to this around 400 cases got stuck ( running state ) but when i try to transfer this so that i can restart it it showd "process instance or activity is in wrong state"
4) On the 13 feb night the pea which runs this activity went down and around 900 workitems went on to wrong state.
5) By 14 th feb around 11 am the execution servers started stopping and restaring automatically .
We cleared many stuck items and still we have around 10 only ..
Even after clearing ,the execution servers is stopping and starting ..
Any clue -->FmcIntExcInvalidBranch seems to be the key in the error code .But i got no info on such a word in any of ibm sites or forums .
Hoping for a clue from ur side
regards
vinod _________________ Vinod sasidharan
Ibm Certfied MQ Admin 5.3
Ibm Certfied MQ Admin 6.0
Ibm Certfied WAS Admin 6.0
Ibm Certfied WMB Admin 5.0
Ibm Certfied Db2 Specialist.
Sun certified Java Programmer.
"Ai carte, ai parte ....................." |
|
Back to top |
|
 |
jmac |
Posted: Thu Feb 19, 2004 5:42 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Jun 2001 Posts: 3081 Location: EmeriCon, LLC
|
1) You are running an old version of MQWF, which I believe is no longer supported.
2) Were any of your new activities UPES? If so are the queues defined? If I remember correctly, this caused the issue you are seeing in V330.
GOOD LUCK _________________ John McDonald
RETIRED |
|
Back to top |
|
 |
vinodsasidharan |
Posted: Thu Feb 19, 2004 5:49 am Post subject: |
|
|
 Apprentice
Joined: 25 Apr 2003 Posts: 47 Location: Norwich
|
Hi john,
No we are not using upes at all .
vinod _________________ Vinod sasidharan
Ibm Certfied MQ Admin 5.3
Ibm Certfied MQ Admin 6.0
Ibm Certfied WAS Admin 6.0
Ibm Certfied WMB Admin 5.0
Ibm Certfied Db2 Specialist.
Sun certified Java Programmer.
"Ai carte, ai parte ....................." |
|
Back to top |
|
 |
mike_mq |
Posted: Wed Oct 20, 2004 7:24 am Post subject: |
|
|
Centurion
Joined: 17 Oct 2003 Posts: 123
|
Vinod,
We are having the same problem with v3.4. Can you tell me how did you resolved your problem.
Error data = FmcIntExcInvalidBranch
Thanks, |
|
Back to top |
|
 |
jmac |
Posted: Wed Oct 20, 2004 7:46 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Jun 2001 Posts: 3081 Location: EmeriCon, LLC
|
Mike:
Can you supply a little more information?
Are you using NOOPs?
Are you getting Database Deadlocks?
Have you looked at all your logs? Anything else suspicious in fmcsys.log fmcerr.log, the error table, the system table, the MQ error logs, the Db2Diag log? _________________ John McDonald
RETIRED |
|
Back to top |
|
 |
mike_mq |
Posted: Wed Oct 20, 2004 9:46 am Post subject: |
|
|
Centurion
Joined: 17 Oct 2003 Posts: 123
|
Jmac,
Are you using NOOPs?
Yes. A lot. All were defined correctly according to NOOP properties.
Are you getting Database Deadlocks?
We used to. Now we are not seeing after applying DB2 8.1 fixpack6, MQ 5.3 csd7, MQWF3.4 sp5.
Have you looked at all your logs? Anything else suspicious in fmcsys.log fmcerr.log, the error table, the system table, the MQ error logs, the Db2Diag log?
fmcsys:
Execution server instance was recycling for every 4 to 10 secs. Once in a while (once in two days, not exactly. Sometimes two times a day) it is throwing "FMC12240E Execution server instance(s) stopped with an error". But nothing was written in error log file.
fmcerr:
This Error report was generated just once a week ago. But in the system log the execution server instance was stopping frequently.
Related message = FMC31050E An error has occurred which has terminated processing.
Error location = File=/projects/fmc/drvp/lbld/v340/aix/src/contact admin.cxx, Line=2
21, Function=FmcActivityState::State(FmcAS::Type, FmcBS::Type)
Error data = FmcIntExcInvalidBranch, Description=AIState=4, PIState=4.
MQError Logs: Nothing was written.
Db2Diag: No errors found.
And, we dint' push any code recently to this environment. |
|
Back to top |
|
 |
jmac |
Posted: Wed Oct 20, 2004 11:03 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Jun 2001 Posts: 3081 Location: EmeriCon, LLC
|
mike_mq wrote: |
Are you using NOOPs?
Yes. A lot. All were defined correctly according to NOOP properties.
|
Mike:
There are only 2 things I can think of to check. 1) be sure you are following the latest NOOP rules (there was a rule change w/ 3.4.0.3 as I recall). The other thing is have you set the RTRecycleThreshold?
Sounds like you might need to open a PMR. _________________ John McDonald
RETIRED |
|
Back to top |
|
 |
mike_mq |
Posted: Thu Oct 21, 2004 4:35 am Post subject: |
|
|
Centurion
Joined: 17 Oct 2003 Posts: 123
|
Yes. We did observed the NOOP rule changes. Infact our code was running fine for several months on Production. We been getting this error for the past few weeks.
Yes. We are using RTRecycle Threshold = 10,000. And I was wondering how this threshold will depend on this issue. Exe Server Instance will recyle for every 10K transactions. Changing the threshold would cause any difference on workflow servers or to this issue?
Matter of fact, we opened a PMR but being this setup is in prod, we cannot turn on lvl 99 tracing. Now trying to replicate this problem on other environments. But not successful so far.
Thanks, |
|
Back to top |
|
 |
jmac |
Posted: Thu Oct 21, 2004 5:24 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Jun 2001 Posts: 3081 Location: EmeriCon, LLC
|
RTRecycleThreshold = 10000, means that after processing 10000 transactions the ExecutionServer will die and a new one will be started. But 10000 is too high for the ES to be dieing continuously I think. Do you have any records on the hold queue? _________________ John McDonald
RETIRED |
|
Back to top |
|
 |
mike_mq |
Posted: Thu Oct 21, 2004 7:54 am Post subject: |
|
|
Centurion
Joined: 17 Oct 2003 Posts: 123
|
My bad. It was 5000. And I see couple of messages in the hold queue. |
|
Back to top |
|
 |
jmac |
Posted: Thu Oct 21, 2004 8:09 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Jun 2001 Posts: 3081 Location: EmeriCon, LLC
|
How many execution servers are you running?
I think 5000 might be a little bit low for that number, but I still dont see how you get ES continually dieing.
Also, a couple of messages on the hold queue is not going to explain this behaviour.
I would say it is best to persue your PMR... please keep us posted. _________________ John McDonald
RETIRED |
|
Back to top |
|
 |
mike_mq |
Posted: Thu Oct 21, 2004 8:54 am Post subject: |
|
|
Centurion
Joined: 17 Oct 2003 Posts: 123
|
currently 4 instances are running. I will update if I hear anything from IBM.
Thanks, |
|
Back to top |
|
 |
|