MQSeries.net :: View topic - Recurring QM Outage that Gens Probid = XC130003 and ZX005022

belchman · Posted: Fri Mar 31, 2006 1:49 pm Post subject:

My questions are at the end. I have some possibly useful data after the FDC text. Thanks in advance for any bread crumbs you might throw my way.

Current Situation:

AIX Version = 5.2.00
MQ Server Version = 530.12 CSD12
MQ CMVC level = p530-12-L051208

Every Thursday or Friday for the last few months we have experienced incidents where all ServerConn channels are forced down by the queue manager. Forcing all MQ clients to reconnect.

The following FDC's are created (in this order)
***************************************
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Thursday March 30 15:27:39 EST 2006 |
| Host Name :- xxxxx (AIX 5.2) |
| PIDS :- 5724B4101 |
| LVLS :- 530.12 CSD12 |
| Product Long Name :- WebSphere MQ for AIX |
| Vendor :- IBM |
| Probe Id :- XC130003 |
| Application Name :- MQM |
| Component :- xehExceptionHandler |
| Build Date :- Dec 8 2005 |
| CMVC level :- p530-12-L051208 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000250 (mqm) |
| Program Name :- amqrmppa |
| Process :- 00618638 |
| Thread :- 00000001 |
| QueueManager :- OQPEGW01 |
| Major Errorcode :- STOP |
| Minor Errorcode :- OK |
| Probe Type :- HALT6109 |
| Probe Severity :- 1 |
| Probe Description :- AMQ6109: An internal WebSphere MQ error has occurred. |
| FDCSequenceNumber :- 0 |
| Arith1 :- 4 4 |
| Comment1 :- SIGILL |
| |
+-----------------------------------------------------------------------------+

MQM Function Stack
rppPoolMain
cccJobMonitor
xcsFFST

MQM Trace History
.
.
.

and

+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Thursday March 30 15:27:50 EST 2006 |
| Host Name :- mqentpr01 (AIX 5.2) |
| PIDS :- 5724B4101 |
| LVLS :- 530.12 CSD12 |
| Product Long Name :- WebSphere MQ for AIX |
| Vendor :- IBM |
| Probe Id :- ZX005022 |
| Application Name :- MQM |
| Component :- zxcProcessChildren |
| Build Date :- Dec 8 2005 |
| CMVC level :- p530-12-L051208 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000250 (mqm) |
| Program Name :- amqzxma0_nd |
| Process :- 00454856 |
| Thread :- 00000001 |
| QueueManager :- OQPEGW01 |
| Major Errorcode :- lrcW_S_FAST_PATH_APP_DEAD |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ7159 |
| Probe Severity :- 3 |
| Probe Description :- AMQ7159: A FASTPATH application has ended |
| unexpectedly. |
| FDCSequenceNumber :- 0 |
| Arith1 :- 618638 9708e |
| Arith2 :- 26221 666d |
| |
+-----------------------------------------------------------------------------+

MQM Function Stack
zxcProcessChildren
xcsFFST

MQM Trace History
.
.
.
**************************

I have opened a PMR with IBM. Here is the info I provided IBM:

**************************
Other relevant information:

We have FASTPATH binding enabled on the queue manager.

We have a node that is continuously spamming the queue manager log with AMQ9208. However, apps hosted on the node do not report loss of MQ functionality.

In diagnosing spamming node, it hosts 4 mq client apps of which 1 is java, 2 are VB.net and 1 is VB6.

In production, the MQ_CONNECT_TYPE is not set.

At least 1 of the apps is compiled with MQ ver 5.2 versions of cmqb.bas, cmqbb.bas,...,cmqxb.bas.

This host supports high TPS customer facing application so I assume they are using MQCONNX. We have other MQ client nodes writing similar errors to QM log but not to the same degree. I assume this is based on TPS.

I also assume that all spamming mq clients are sharing 1 or more of the compiled objects without having MQ_CONNECT_TYPE set, hence are binding FASTPATH.

We have had 5 or 6 similar incidents since October 2005 and have noticed that all incidents have occurred on a Thursday or Friday.

We bounce the MQ host every Sunday morning.

Corrective actions already taken:

I am attempting to reproduce the problem in QA. We were able to get the QA MQ client node to write the same error to the QA QM log.

What I have done so far in QA:
-> Created and set the MQ_CONNECT_TYPE Env Var to FASTPATH.
-> Bounced 1 suspected MQ client app
-> Put messages that generated the log entries
-> Set MQ_CONNECT_TYPE Env Var to STANDARD
-> Bounced MQ client app
-> Put messages that did not generate log entries
-> Set MQ_CONNECT_TYPE Env Var to FASTPATH
-> Bounced MQ client app
-> Put messages that did not generate log entries I am confused as to why last FASTPATH test didn't gen errors.

Question: Can anyone shed some light on what is causing this incident to occur?
Question: Can anyone shed some light on how to replicate problem?/b]
[b]Question: Can anyone provide their experiences with using Windows client in large shop?