MQSeries.net :: View topic - AMQRMPPA in AIX

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » AMQRMPPA in AIX

AMQRMPPA in AIX

« View previous topic :: View next topic »

Author

Message

GFORCE

Posted: Fri Feb 15, 2008 9:34 am Post subject: AMQRMPPA in AIX

Voyager

Joined: 16 Jun 2003
Posts: 78
Location: WISCONSIN

We recylce our test AIX box weekly and everytime I have to logon and kill the AMQRMPPA process to recycle MQ. Is there any way around this besides killing the process?
_________________
THANKS

PeterPotkay

Posted: Fri Feb 15, 2008 10:13 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7722

I got the same problem on my Linux x86 32bit QMs. It happened at MQ 6.0.1.0, 6.0.2.0 and 6.0.2.1. Going to 6.0.2.3 soon hoping the problem goes away. Its annoying. It doesn't happen every time. Sometimes if I wait 5-10 minutes they eventually stop, but usually when you are restarting a QM you don't have time to sit there and wait who knows how long.

Turning trace on makes the problem go away.

_________________
Peter Potkay
Keep Calm and MQ On

PeterPotkay

Posted: Mon Feb 25, 2008 6:05 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7722

Anyone else got this problem? Our MQ shutdown scripts endmqlsr first, then endmqm -i. Yet if the QM has more than a few running client channels (i.e. there is more than one amqrmppa process running) more often than not we have to kill those amqrmppa processes. Even when they do go down on their own it takes 10-15 minutes. As I said before running trace seems to make the problem go away. I just upgraded to 6.0.2.3 and the problem is still there.

_________________
Peter Potkay
Keep Calm and MQ On

GFORCE

Posted: Mon Mar 03, 2008 8:36 am Post subject: AMQRMPPA in AIX

Voyager

Joined: 16 Jun 2003
Posts: 78
Location: WISCONSIN

I set the TCP keep alive parm in the QM.INI and it appears to work also. I am still trying several options and I will try the trace option as you stated, but I have to go through our change control with every change.....
_________________
THANKS

PeterPotkay

Posted: Tue Mar 11, 2008 9:28 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7722

IBM identified a problem and will be creating an interim fix. Tracing fixed the problem each time. They gave us a command to run every few seconds while the QM was taking forever to come down that produced an FDC each time. That finally highlited the problem.

Quote:

Based on the supplied data, we have narrowed down the cause of the
endmqm delay to two parts of the code which stops channels. We are doing
some further testing in order to better understand the delay and hope to
produce an interim fix later this week. I expect to report back with
more information tomorrow.
************
Update non 11th March:
Further to yesterday's update, the FDCs supplied again showed that it
was the ending of channels which delayed the endmqm process. In
particular, channel process 16634 kept running for a very long time
after the queue manager was asked to end. The endmqm process was waiting
for channel process 16634 to end before it could finish.
.
When we look at what the FDCs showed for 16634, it seems that there were
a number of channel threads (e.g. threads 82, 85, 92 and 94) still
active inside it. Since these threads had not finished processing, the
process had not ended.
.
The FFST shows what these last threads had been doing as the queue
manager ended. We see that most threads had noticed that the queue
manager had ended, but then carried on regardless. For example, here is
an excerpt from thread 94's history:
.
----} zstMQGET rc=lrcE_Q_MGR_STOPPING
---} MQGET rc=lrcE_Q_MGR_STOPPING
...
---{ MQCLOSE
----{ zstMQCLOSE
-----{ zstVerifyPCD
-----} zstVerifyPCD rc=OK
-----{ zutCallApiExitsBeforeClose
------{ APIExit
-------{ MQGET
--------{ zstMQGET
---------{ zstVerifyPCD
---------} zstVerifyPCD rc=OK
---------{ ziiBreakConnection
---------} ziiBreakConnection rc=OK
--------} zstMQGET rc=lrcE_CONNECTION_BROKEN
-------} MQGET rc=lrcE_CONNECTION_BROKEN
------} APIExit rc=OK
-----} zutCallApiExitsBeforeClose rc=OK
-----{ zutCallApiExitsAfterClose
------{ APIExit
------} APIExit rc=lrcE_CONNECTION_BROKEN
-----} zutCallApiExitsAfterClose rc=OK
-----{ ziiBreakConnection
-----} ziiBreakConnection rc=OK
----} zstMQCLOSE rc=lrcE_CONNECTION_BROKEN
---} MQCLOSE rc=lrcE_CONNECTION_BROKEN
...
---{ ccxReceive
----{ cciTcpReceive
-----{ ccxAllocMem
-----} ccxAllocMem rc=OK
-----{ recv
-----} recv rc=Unknown(FFFF)
-----{ xcsWaitFd
------{ poll
------} poll rc=Unknown(1)
-----} xcsWaitFd rc=Unknown(1)
-----{ recv
-----} recv rc=Unknown(FFFF)
-----{ xcsWaitFd
------{ poll
------} poll rc=Unknown(1)
.
Despite knowing that the queue manager is ending and that its own
connection to the queue manager has been broken, the thread continued to
run and poll its network socket for more MQI calls from the client.
However, even if such a call arrived there would be nothing useful that
the channel could do with it because its connection has gone. So the
thread should really have ended at that point. It is only after multiple
failed poll() calls that the channel threads finally time out and end,
which allows endmqm processing to complete.
.
We should point out that client applications should specify the
appropriate FAIL_IF_QUIESCING option on all of their MQI calls in order
to speed up endmqm processing. The trace supplied on 3rd March shows
some clients which are not using the "fail if quiescing" option.
However, I believe that endmqm -i should still end the queue manager
within a reasonable time regardless of the MQI options. For this reason,
I think the queue manager should try harder to end client channels than
it currently does.
.
Based on the sequence of events in the FFSTs, it is clear that all of
the threads which failed to end had recieved MQRC_Q_MGR_STOPPING and
MQRC_CONNECTION_BROKEN as early as 08:12:39. Had they detected this fact
they would have ended much sooner, instead of hanging around until
18:18:55 when endmqm finally finished.
.
We are building a test fix which adds extra checking to the server
(SVRCONN) end of the channel in order to better handle shutdown in cases
where MQI calls report that the queue manager is ending. I will also
include additional FFST diagnostics in the code so as to produce better
SIGUSR2 FDC files in cases of future delays.

_________________
Peter Potkay
Keep Calm and MQ On

jefflowrey

Posted: Tue Mar 11, 2008 9:34 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

That seems to point the finger in two places: a) client apps that don't use FAIL_ON_QUIESCE, and b) the channel which should kill itself after it's sent at least one FAIL_ON_QUIESCE.

So I'd a) wait for the fix, and b) fatten your trout for those app teams that aren't using FAIL_ON_QUIESCE.
_________________
I am *not* the model of the modern major general.

PeterPotkay

Posted: Tue Mar 11, 2008 9:43 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7722

Even with a Monty Python sized trout you'll never be able to guarantee that every app uses FAIL_ON_QUIESCE. Even if they say the use it. Even if you see some code that uses it, its not proof that that's what's running in PROD. That's why we rely on endmqm -i. I'm glad IBM found the problem. Waiting 10 minutes for the QM to come down is an eternity in the middle of the night with the change window's end time approaching.
_________________
Peter Potkay
Keep Calm and MQ On

Toronto_MQ

Posted: Wed Mar 12, 2008 7:46 am Post subject:

Master

Joined: 10 Jul 2002
Posts: 263
Location: read my name

I'm glad you've gotten somewhere with this. We have the same problem (on Solaris) and our PMRs got us nowhere. We have taken to issuing the endmqm -i, waiting a minute, then a -p, another minute, then we start killing the amqrmppa processes. Nice to see a fix may eventually come around.

I agree in an ideal world we would have the apps code fail_if_quiesce. And we always stress this as rule #1. But I think we all know we don't live in an ideal world. If I have to listen to "this is vendor code, we can't change that" one more time...

GFORCE

Posted: Tue Mar 18, 2008 5:04 am Post subject:

Voyager

Joined: 16 Jun 2003
Posts: 78
Location: WISCONSIN

I am glad this resulted in a fix from IBM. I hope the PTF is available soon.

Thanks for your help...this forum is great!!!!
_________________
THANKS

PeterPotkay

Posted: Thu Mar 20, 2008 11:56 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7722

Contact IBM Support if you need the interim fix for this. Its called IZ18142. Its past the cutoff for being included in 6.0.2.4. The earliest it would be in is 6.0.2.5.

I only tested the fix for Linux. I informed them that Solaris and AIX appears to have the same bug based on this thread.
_________________
Peter Potkay
Keep Calm and MQ On

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » AMQRMPPA in AIX

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP