Author |
Message
|
shashivarungupta |
Posted: Wed Mar 03, 2010 4:33 am Post subject: an FDC " Probe Description :- AMQ6150: WebSphere MQ sem |
|
|
 Grand Master
Joined: 24 Feb 2009 Posts: 1343 Location: Floating in space on a round rock.
|
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Wednesday March 03 03:38:00 PST 2010 |
| Host Name :- czapie3 (AIX 5.3) |
| PIDS :- 5724H7201 |
| LVLS :- 6.0.2.1 |
| Product Long Name :- WebSphere MQ for AIX |
| Vendor :- IBM |
| Probe Id :- XC307040 |
| Application Name :- MQM |
| Component :- xlsRequestMutex |
| SCCS Info :- lib/cs/unix/rs_aix32/amqxlfsx.c, 1.75.1.4 |
| Line Number :- 1889 |
| Build Date :- Mar 7 2007 |
| CMVC level :- p600-201-070307 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000270 (mqm) |
| Program Name :- amqzlaa0_nd |
| Addressing mode :- 64-bit |
| Process :- 999568 |
| Thread :- 285212 |
| QueueManager :- PACSRB00 |
| ConnId(1) IPCC :- 107442079 |
| ConnId(2) QM :- 1760937 |
| Last HQC :- 2.0.0-588304 |
| Last HSHMEMB :- 2.8.39-6555256 |
| Major Errorcode :- xecL_W_LONG_LOCK_WAIT |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6150 |
| Probe Severity :- 3 |
| Probe Description :- AMQ6150: WebSphere MQ semaphore is busy. |
| FDCSequenceNumber :- 3 |
| |
+-----------------------------------------------------------------------------+
Any Idea ?
Information :
Name: WebSphere MQ
Version: 6.0.2.1 _________________ *Life will beat you down, you need to decide to fight back or leave it. |
|
Back to top |
|
 |
shashivarungupta |
Posted: Wed Mar 03, 2010 4:38 am Post subject: |
|
|
 Grand Master
Joined: 24 Feb 2009 Posts: 1343 Location: Floating in space on a round rock.
|
----- amqxfdcx.c : 768 --------------------------------------------------------
03/03/10 03:55:23 - Process(999568.285205) User(mqm) Program(amqzlaa0_nd)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager PACSRB00.
EXPLANATION:
An error has been detected, and the WebSphere MQ error recording routine has
been called. The failing process is process 999568.
ACTION:
Use the standard facilities supplied with your system to record the problem
identifier, and to save the generated output files. Contact your IBM support
center. Do not discard these files until the problem has been resolved.
$ps -ef | grep 999568
mqm 999568 1404988 0 Aug 15 - 770:02 amqzlaa0 -mPACSRB00 -fip2
this is what I can see. in the /var/mqm/errors/ in the error log AMQERR01.LOG and FDC AMQ999568.0.FDC _________________ *Life will beat you down, you need to decide to fight back or leave it. |
|
Back to top |
|
 |
exerk |
Posted: Wed Mar 03, 2010 4:46 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
And Mr. Google says what in regard to the probe ID? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
shashivarungupta |
Posted: Wed Mar 03, 2010 4:53 am Post subject: |
|
|
 Grand Master
Joined: 24 Feb 2009 Posts: 1343 Location: Floating in space on a round rock.
|
exerk wrote: |
And Mr. Google says what in regard to the probe ID? |
Ya.. first of.. i checked on that...
and it gives..
http://www-01.ibm.com/support/docview.wss?rs=171&uid=swg1SA95786
even i checked the error log that i mentioned above..
mqm 860172 1404988 0 Aug 15 - 1842:02 amqzmuc0 -m PACSRB00
mqm 4821100 1404988 0 Nov 25 - 1345:20 amqzlaa0 -mPACSRB00 -fip822
mqm 5881922 1404988 0 Feb 02 - 1200:03 amqzlaa0 -mPACSRB00 -fip827
And I can see the amqzlaa0 PID is changing.. by the time.
Though the client was saying that they are not able to see the queues. but i checked the queues and I can .. all is fine as far as queues are concerned.
It seems to be the Semaphore issue ! and Another team of 'Storage' is looking at the issue. Middleware is taking a deep breath for a while and digging into the cause !!!
'It says when the resources are occupied by some appl. and there is another request for the same resource .. then there is some deadlock for a while as the mq processes are busy resolving it.' _________________ *Life will beat you down, you need to decide to fight back or leave it. |
|
Back to top |
|
 |
mvic |
Posted: Wed Mar 03, 2010 5:56 am Post subject: Re: an FDC " Probe Description :- AMQ6150: WebSphere MQ |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
What is the MQM Function Stack under this FDC header? |
|
Back to top |
|
 |
shashivarungupta |
Posted: Wed Mar 03, 2010 7:17 am Post subject: Re: an FDC " Probe Description :- AMQ6150: WebSphere MQ |
|
|
 Grand Master
Joined: 24 Feb 2009 Posts: 1343 Location: Floating in space on a round rock.
|
mvic wrote: |
What is the MQM Function Stack under this FDC header? |
MQM Function Stack
zlaMainThread
zlaProcessMessage
zlaProcessMQIRequest
zlaMQGET
zsqMQGET
kpiMQGET
kqiWaitForMessage
atmCompleteOp
atmUnlockDataMutex
xcsReleaseMutexSem
xlsReleaseMutex
xcsFFST _________________ *Life will beat you down, you need to decide to fight back or leave it. |
|
Back to top |
|
 |
mvic |
Posted: Wed Mar 03, 2010 7:38 am Post subject: Re: an FDC " Probe Description :- AMQ6150: WebSphere MQ |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
shashivarungupta wrote: |
mvic wrote: |
What is the MQM Function Stack under this FDC header? |
MQM Function Stack
zlaMainThread
zlaProcessMessage
zlaProcessMQIRequest
zlaMQGET
zsqMQGET
kpiMQGET
kqiWaitForMessage
atmCompleteOp
atmUnlockDataMutex
xcsReleaseMutexSem
xlsReleaseMutex
xcsFFST |
This is from a different file (or perhaps a different portion of the same file).
The portion of the file you posted at first would have contained xlsRequestMutex in the MQM Function Stack.
The reason for these "long lock wait" FDC records is usually a short-term resource shortage, causing a long wait for a mutex. Particularly CPU or memory are the resources that might have been short, but in rare cases it might also be a slow disk that causes such a situation.
I suggest raising a PMR and giving IBM Support a chance to comment on the FDCs. |
|
Back to top |
|
 |
shashivarungupta |
Posted: Wed Mar 03, 2010 7:53 am Post subject: |
|
|
 Grand Master
Joined: 24 Feb 2009 Posts: 1343 Location: Floating in space on a round rock.
|
You are right !
In the FDC file there are many 'WebSphere MQ First Failure Symptom Report'
This was mentioned below the FDC Header :
MQM Function Stack
zlaMainThread
zlaProcessMessage
zlaProcessMQIRequest
zlaMQPUT
zsqMQPUT
kpiMQPUT
kqiPutIt
kqiPutMsgSegments
apiLockExclusive
xcsRequestMutexSem
xlsRequestMutex
xcsFFST
I am not aware about the "xlsRequestMutex" could you please guide me on that with other fields.
By the way I have been discussing this with my senior team members and suggested to raise the PMR for the same. Though it was the issue with 'Storage' and was being auto resolved by MQ, I believe. _________________ *Life will beat you down, you need to decide to fight back or leave it. |
|
Back to top |
|
 |
mvic |
Posted: Wed Mar 03, 2010 8:06 am Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
shashivarungupta wrote: |
I am not aware about the "xlsRequestMutex" could you please guide me on that with other fields. |
Sorry I don't understand the question.
Quote: |
By the way I have been discussing this with my senior team members and suggested to raise the PMR for the same. Though it was the issue with 'Storage' and was being auto resolved by MQ, I believe. |
I agree raising a PMR would be a wise thing, to give IBM the chance to comment. People often use the word "storage" to mean RAM/memory. Some people use it to mean "disk space". In the case of your FDCs I guess it is CPU/memory that was short. But I am guessing, to some extent. |
|
Back to top |
|
 |
shashivarungupta |
Posted: Wed Mar 03, 2010 8:19 am Post subject: |
|
|
 Grand Master
Joined: 24 Feb 2009 Posts: 1343 Location: Floating in space on a round rock.
|
mvic wrote: |
shashivarungupta wrote: |
I am not aware about the "xlsRequestMutex" could you please guide me on that with other fields. |
Sorry I don't understand the question. |
I was curious about the fields under "MQM Function Stack" and how did you/anyone would come to know that a particular error has occurred and he/she should be looking directly into that field. As in this case you pointed out 'xlsRequestMutex'.
mvic wrote: |
Quote: |
By the way I have been discussing this with my senior team members and suggested to raise the PMR for the same. Though it was the issue with 'Storage' and was being auto resolved by MQ, I believe. |
I agree raising a PMR would be a wise thing, to give IBM the chance to comment. People often use the word "storage" to mean RAM/memory. Some people use it to mean "disk space". In the case of your FDCs I guess it is CPU/memory that was short. But I am guessing, to some extent. |
Yes, you are perfectly right !
The 'Storage' team came up with a comment that 'there is something wrong with HDD/Adapter' and one of'em are in 'DEAD' state. _________________ *Life will beat you down, you need to decide to fight back or leave it. |
|
Back to top |
|
 |
mvic |
Posted: Wed Mar 03, 2010 9:10 am Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
shashivarungupta wrote: |
I was curious about the fields under "MQM Function Stack" and how did you/anyone would come to know that a particular error has occurred and he/she should be looking directly into that field. As in this case you pointed out 'xlsRequestMutex'. |
FDC files mainly contain information about the internals of MQ, so I am not advising folks outside of the IBM support teams to go reading it.
I only asked so that I could then search for a known APAR etc. mentioning the names in that stack. This FDC is quite generic, and so the context in the MQM Function Stack is important if you want to search for known problems. |
|
Back to top |
|
 |
shashivarungupta |
Posted: Fri Mar 05, 2010 6:31 am Post subject: |
|
|
 Grand Master
Joined: 24 Feb 2009 Posts: 1343 Location: Floating in space on a round rock.
|
Following also proves what we concluded :
03/03/10 03:30:38 - Process(471192.1) User(mqm) Program(runmqchl_nd)
AMQ9558: Remote Channel is not currently available.
EXPLANATION:
The channel program ended because the channel 'QM1_QM2' is not
currently available on the remote system. This could be because the channel is disabled or that the remote system does not have sufficient resources to run a further channel.
ACTION:
Check the remote system to ensure that the channel is available to run, and retry the operation.
----- amqrfpta.c : 340 --------------------------------------------------------
03/03/10 03:30:38 - Process(471192.1) User(mqm) Program(runmqchl_nd)
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'QM1_QM2' ended abnormally.
ACTION:
Look at previous error messages for channel program 'QM1_QM2' in the
error files to determine the cause of the failure.
Thanks. _________________ *Life will beat you down, you need to decide to fight back or leave it. |
|
Back to top |
|
 |
|