Author |
Message
|
jwells34 |
Posted: Mon Feb 11, 2008 8:45 am Post subject: Execution Group Crashes - How to determine Root Cause? |
|
|
Newbie
Joined: 01 Nov 2007 Posts: 6
|
Hi,
Here's the Environment setup:
AIX 5.3.4
2 x CPU 600Mhz
12 GB RAM
Software
WBIMB v5.0 FP09
WMQ Server v6.0.2.2
DB2 v8.1.1.136
We have an Execution Group that crashes almost daily. The first time a message comes it, it fails. We reprocess the message it works successfully.
So, once a message arrives in the MQInput Node, it crashes and restarts itself (at least that is what it appears has happened). The message it was trying to process immediately goes to the failure note and we get this in the trace file from the Message Flow that is in the Execution Group:
__________________________________________________ 2008-02-10 06:37:46.214421 --------------------------------------------------------------------------- EXCEPTION LIST (
(0x01000000):RecoverableException = (
(0x03000000):File = '/build/S500_P/src/DataFlowEngine/ImbMqInputNode.cpp'
(0x03000000):Line = 3328
(0x03000000):Function = 'ImbMqInputNode::eligibleForBackout'
(0x03000000):Type = 'ComIbmMQInputNode'
(0x03000000):Name = 'PHD_PROCESS_TANKS_IN#FCMComposite_1_1'
(0x03000000):Label = 'PHD_PROCESS_TANKS_IN.PHD.PROCESS.TANKS.IN'
(0x03000000):Text = 'Dequeued failed message. Propagating a message to the failure terminal'
(0x03000000):Catalog = 'BIPv500'
(0x03000000):Severity = 3
(0x03000000):Number = 2652
)
I also found the broker log file:
abend record for pid 44126 tid 2075 time in seconds since 01/01/1970: 1202553444
File: /build/S500_P/src/CommonServices/Unix/ImbAbend.cpp
Line: 515
Function: signal received
---- Inserts ----
4
@(#) 1.33.2.5 CommonServices/Unix/ImbAbend.cpp, CommonServices, S500, S500-CSD08D1 06/04/06 16:54:05 [5/2/06 20:00:12]
999383040
-----------------
----------------------------- Stack dump for current thread ( 2075)
(0xdf164918+0x000007a0) ttcdrv [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdf0eb930+0x00000050) nioqwa [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xded4e474+0x00000640) upirtrc [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee54a60+0x0000006c) kpurcsc [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee6f4cc+0x0000053c) kpuexecv8 [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee705cc+0x00000ff8) kpuexec [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee972e8+0x0000001c) OCIStmtExecute [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xde861c3c+0x00000000) <no name available> [/usr/opt/wmqi/merant/lib/UKor818.so]
(0xde9d8068+0x00000000) <no name available> [/usr/lib/libUKbas18.a(UKbas18.so)]
(0xde9d6cf0+0x00000000) <no name available> [/usr/lib/libUKbas18.a(UKbas18.so)]
(0xde84299c+0x00000000) <no name available> [/usr/opt/wmqi/merant/lib/UKor818.so]
(0xdb69c7ac+0x00000000) <no name available> [/usr/opt/mqsi/merant/lib/libodbc.a(odbc.so)]
(0xda3281b8+0x00000460) execute__16ImbOdbcStatementFv [/usr/opt/mqsi/lib/libMessageServices.a(libMessageServices.a.so)]
(0xdc02c284+0x0000104c) executeStmt__17SqlExternalDbStmtCFR14SqlEvalEnvironRC13ImbStringBaseXTwTQ2_3std11char_traitsXTw_TcSP37_Q2_3std6_PtritXTP17SqlExpressionNodeTlTPCP17SqlExpressionNodeTRCP17SqlExpressionNodeTPP17SqlExpressionNodeTRP17SqlExpressionNode_T3RC18SqlGeneralLocationi [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc07dec8+0x0000036c) evaluate__17SqlPassthruFnCallCFR9SqlResult [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc26d224+0x00000084) assignToMessage__13SqlAssignmentCFR14SqlEvalEnviron [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc2697b8+0x00000b54) execute__13SqlAssignmentCFR18SqlStatementResult [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc051b1c+0x000000d8) execute__17SqlStatementGroupCFR18SqlStatementResult [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc274c70+0x000001b0) execute__15SqlCompoundStmtCFR18SqlStatementResult [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdbf01164+0x00000168) execute__10SqlRoutineCFR18SqlStatementResult [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc3cb278+0x000000ec) execute__9SqlModuleCFR18SqlStatementResult [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc2860cc+0x0000002c) execute__9SqlSchemaCFR18SqlStatementResult [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdc3d2e8c+0x0000054c) evaluate__19SqlComputeInterfaceFRC18ImbMessageAssemblyR18ImbMessageAssembly [/usr/opt/mqsi/lib/libImbRdl.a(libImbRdl.a.so)]
(0xdcdfe080+0x0000040c) evaluate__14ImbComputeNodeFRC18ImbMessageAssemblyPC19ImbDataFlowTerminal [/usr/opt/mqsi/lil/imbdfsql.lil]
(0xdb88dba0+0x000001d8) evaluate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdba48afc+0x00000354) propagate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdce57c98+0x00000284) propagate__14ImbComputeNodeFRC18ImbMessageAssemblyR18ImbMessageAssembly [/usr/opt/mqsi/lil/imbdfsql.lil]
(0xdcdfe080+0x00000c4c) evaluate__14ImbComputeNodeFRC18ImbMessageAssemblyPC19ImbDataFlowTerminal [/usr/opt/mqsi/lil/imbdfsql.lil]
(0xdb88dba0+0x000001d8) evaluate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdba48afc+0x00000354) propagate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdce57c98+0x00000284) propagate__14ImbComputeNodeFRC18ImbMessageAssemblyR18ImbMessageAssembly [/usr/opt/mqsi/lil/imbdfsql.lil]
(0xdcdfe080+0x00000c4c) evaluate__14ImbComputeNodeFRC18ImbMessageAssemblyPC19ImbDataFlowTerminal [/usr/opt/mqsi/lil/imbdfsql.lil]
(0xdb88dba0+0x000001d8) evaluate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdba48afc+0x00000354) propagate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdc691e48+0x00000108) evaluate__15ImbTryCatchNodeFRC18ImbMessageAssemblyPC19ImbDataFlowTerminal [/usr/opt/mqsi/lil/imbdfbas.lil]
(0xdb88dba0+0x000001d8) evaluate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdba48afc+0x00000354) propagate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/usr/opt/mqsi/lib/libDataFlowDLL.a(libDataFlowDLL.a.so)]
(0xdc749f08+0x000055e0) readQueue__14ImbMqInputNodeFP11ImbOsThread [/usr/opt/mqsi/lib/libMQLibrary.a(libMQLibrary.a.so)]
(0xdc756180+0x00000048) run__Q2_14ImbMqInputNode10ParametersFP11ImbOsThread [/usr/opt/mqsi/lib/libMQLibrary.a(libMQLibrary.a.so)]
(0xd6fa3108+0x00000070) run__27ImbThreadPoolThreadFunctionFP11ImbOsThread [/usr/opt/mqsi/lib/libCommonServices.a(libCommonServices.a.so)]
(0xd6f942e8+0x00000054) threadRun__11ImbOsThreadFv [/usr/opt/mqsi/lib/libCommonServices.a(libCommonServices.a.so)]
(0xd6f93e8c+0x00000064) threadBootStrap__11ImbOsThreadFPv [/usr/opt/mqsi/lib/libCommonServices.a(libCommonServices.a.so)]
(0xd004c528+0x0000011c) _pthread_body [/usr/lib/libpthreads.a(shr_xpg5.o)]
(0x00000000) <invalid code address>
----------------------------------------------------------------------
I was wondering if anyone else has run into this issue or have some ideas to help debug the situation.
I appreciate your time.
Thanks |
|
Back to top |
|
 |
jefflowrey |
Posted: Mon Feb 11, 2008 9:09 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Either your SQL statement has a problem, or your database connection was dropped overnight. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Ian |
Posted: Mon Feb 11, 2008 9:38 am Post subject: |
|
|
Disciple
Joined: 22 Nov 2002 Posts: 152 Location: London, UK
|
You have indicated that your (broker) environment uses DB2.
Quote: |
WBIMB v5.0 FP09
WMQ Server v6.0.2.2
DB2 v8.1.1.136
|
However, the call stack shows that the failure is in an Oracle client library.
Oracle Client ---> /var/mqsi/lib/libclntsh.a
DataDirect (Merant) Oracle Driver used by Message Broker ---> /usr/opt/wmqi/merant/lib/UKor818.so
Quote: |
0xdf164918+0x000007a0) ttcdrv [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdf0eb930+0x00000050) nioqwa [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xded4e474+0x00000640) upirtrc [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee54a60+0x0000006c) kpurcsc [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee6f4cc+0x0000053c) kpuexecv8 [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee705cc+0x00000ff8) kpuexec [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xdee972e8+0x0000001c) OCIStmtExecute [/var/mqsi/lib/libclntsh.a(shr.o)]
(0xde861c3c+0x00000000) <no name available> [/usr/opt/wmqi/merant/lib/UKor818.so]
(0xde9d8068+0x00000000) <no name available> [/usr/lib/libUKbas18.a(UKbas18.so)]
(0xde9d6cf0+0x00000000) <no name available> [/usr/lib/libUKbas18.a(UKbas18.so)]
(0xde84299c+0x00000000) <no name available> [/usr/opt/wmqi/merant/lib/UKor818.so]
(0xdb69c7ac+0x00000000) <no name available> [/usr/opt/mqsi/merant/lib/libodbc.a(odbc.so)]
(0xda3281b8+0x00000460) execute__16ImbOdbcStatementFv [/usr/opt/mqsi/lib/libMessageServices.a(libMessageServices.a.so)]
|
I would suggest you first sort out whether you are expecting access to DB2 or Oracle. If it is the latter then you should search for known Oracle problems relating to the 'ttcdrv' method in the Oracle Client library. _________________ Regards, Ian |
|
Back to top |
|
 |
jwells34 |
Posted: Mon Feb 11, 2008 10:09 am Post subject: |
|
|
Newbie
Joined: 01 Nov 2007 Posts: 6
|
Hi,
Thanks for the information so far. Greatly appreciated!!!
I should of metioned the Message Flow does perform an Oracle database lookup not DB2.
So, if the Oracle connection is getting dropped and causing the Execution Group to crash, is there anyway around this situation?
Josh |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Feb 11, 2008 11:13 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
FP10 for WB-IMB 5.0 has some fixes for this type of problem. We had the same issue with an EG crashing even an application DB2 DB on the mainframe went down and up. FP10 fixed that, although we didn't upgrade to FP10 for other reasons (a lot of our flows needed code changes).
We put the problem flow in its own EG to isolate the effect of the EG restarting. Its rare enough that we decided to live with it until we upgrade to WMB 6.0. Plus the EG "only" crashes once. When the message is retried it works. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
jwells34 |
Posted: Tue Feb 19, 2008 6:27 am Post subject: |
|
|
Newbie
Joined: 01 Nov 2007 Posts: 6
|
I was curious if you know what APAR fixed your issue? |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Feb 19, 2008 9:56 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I never dug into the Fix Pack read me to see which APAR it was. IBM Support said CSD10 had lots of connection fixes. We applied it and the problem went away. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
|