Author |
Message
|
MABeatty1978 |
Posted: Fri May 20, 2016 6:29 am Post subject: MQ process preventing shutdown |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
In our production environment a queue manager has been failing to shutdown to permit a backup. The endmqm is done with -w. Typically, it takes 15-20 seconds for the shutdown to complete. The last 3 nights it went hours before manual intervention took it down via -p. No trouble shooting was done by our support staff. They came to me today.
I attempted to do a -w shutdown on the qmgr and was able to replicate the issue. I waited for 5 minutes before -p. I then started and tried another shutdown and it shutdown fine. I repeated this a half dozen times and was unable to replicate it again.
There are no applications that should ever be connected to the qmgr for longer than it takes to put a single message. And nothing in any of my logs shows that any applications fail to dissconnect from the qmgr.
Is there any way to track down what, exactly was holding onto the qmgr that was preventing it from shutting down. ps - ef | grep mq showed the following when it was trying to shutdown.
Code: |
[desk@pos2982 ~]$ dspmq
QMNAME(PROD_POS2982_QM) STATUS(Quiescing)
[desk@pos2982 ~]$ ps -ef | grep mq
mqm 13794 1 0 09:57 ? 00:00:00 amqzxma0 -m PROD_POS2982_QM
mqm 13799 13794 0 09:57 ? 00:00:00 /opt/mqm/bin/amqzfuma -m PROD_POS2982_QM
mqm 13805 13794 0 09:57 ? 00:00:00 amqzmuc0 -m PROD_POS2982_QM
mqm 13822 13794 0 09:57 ? 00:00:00 amqzmur0 -m PROD_POS2982_QM
mqm 13823 13794 0 09:57 ? 00:00:00 amqzmuf0 -m PROD_POS2982_QM
mqm 13832 13794 0 09:57 ? 00:00:00 /opt/mqm/bin/amqrrmfa -m PROD_POS2982_QM -t2332800 -s2592000 -p2592000 -g5184000 -c3600
mqm 13833 13794 0 09:57 ? 00:00:00 /opt/mqm/bin/amqzdmaa -m PROD_POS2982_QM
mqm 13849 13794 0 09:57 ? 00:00:00 /opt/mqm/bin/amqzmgr0 -m PROD_POS2982_QM
mqm 13855 13794 0 09:57 ? 00:00:00 amqzlaa0 -mPROD_POS2982_QM -fip0
root 16838 16827 0 10:04 pts/0 00:00:00 su - mqm
mqm 16839 16838 0 10:04 pts/0 00:00:00 -bash
mqm 16898 16839 0 10:04 pts/0 00:00:00 endmqm -w PROD_POS2982_QM
desk 16972 16911 0 10:05 pts/2 00:00:00 grep mq |
Is there anything in here that jumps out? Anything I can do in the future if it happens again to figure it out? |
|
Back to top |
|
 |
Vitor |
Posted: Fri May 20, 2016 6:56 am Post subject: Re: MQ process preventing shutdown |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
MABeatty1978 wrote: |
There are no applications that should ever be connected to the qmgr for longer than it takes to put a single message. |
That must be efficient for applications putting large numbers of messages as part of a batch process.
What do applications reading from the queues do? Connect, get a message and disconnect?
I ask because the behavior you describe (especially the intermittent ability to reproduce it) screams of an application either not using FAIL_IF_QUIESCING or taking its own sweet time to disconnect once it gets a quiesce reason code from a get/put operation. Or "disconnecting" by stopping in it's tracks and not properly disconnecting.
(Consider an app that connects, tries to do a put, gets a quiesce but then decides it's going to roll back a ton of database stuff before disconnecting from the queue manager. Especially if the database then hangs up trying to roll back....)
MABeatty1978 wrote: |
Is there any way to track down what, exactly was holding onto the qmgr that was preventing it from shutting down. |
It would be informative to ask the queue manager what it thinks was connected to it. If it says there are active connections but all the applications say they've disconnected, that's a clue.
MABeatty1978 wrote: |
Is there anything in here that jumps out? |
Not to me
MABeatty1978 wrote: |
Anything I can do in the future if it happens again to figure it out? |
Check for active connections. Note that applications have multiple logic paths, so the fact that they connect & disconnect properly on most occasions doesn't mean (for example) that their error handling isn't cleaning up MQ resources incorrectly. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
MABeatty1978 |
Posted: Fri May 20, 2016 8:03 am Post subject: Re: MQ process preventing shutdown |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
Vitor wrote: |
MABeatty1978 wrote: |
There are no applications that should ever be connected to the qmgr for longer than it takes to put a single message. |
That must be efficient for applications putting large numbers of messages as part of a batch process.
|
There are no batch process.
Vitor wrote: |
What do applications reading from the queues do? Connect, get a message and disconnect? |
Nothing reads from queue, only puts to a cluster queue. The reads/gets are done on the other end.
Vitor wrote: |
I ask because the behavior you describe (especially the intermittent ability to reproduce it) screams of an application either not using FAIL_IF_QUIESCING or taking its own sweet time to disconnect once it gets a quiesce reason code from a get/put operation. Or "disconnecting" by stopping in it's tracks and not properly disconnecting.
(Consider an app that connects, tries to do a put, gets a quiesce but then decides it's going to roll back a ton of database stuff before disconnecting from the queue manager. Especially if the database then hangs up trying to roll back....) |
When the application needs to put a message it runs a program passing in the message data as a parameter. That program connects, opens, puts, closes, disconnects and returns to the calling program the reason code. The calling program looks at the rc if 0=success else try the put again in 10 seconds. It does use FAIL_IF_QUIESCING.
Its worth noting that this application is running on close to 5,000 qmgrs and has been for 5 years now. This is the first time I've seen this happen.
Vitor wrote: |
It would be informative to ask the queue manager what it thinks was connected to it. If it says there are active connections but all the applications say they've disconnected, that's a clue. |
Forgive my ignorance, how do I do that? |
|
Back to top |
|
 |
mqjeff |
Posted: Fri May 20, 2016 8:06 am Post subject: Re: MQ process preventing shutdown |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
MABeatty1978 wrote: |
That program connects, opens, puts, closes, disconnects and returns to the calling program the reason code. The calling program looks at the rc if 0=success else try the put again in 10 seconds. It does use FAIL_IF_QUIESCING. |
That can leave unclosed network connections active...
MABeatty1978 wrote: |
Forgive my ignorance, how do I do that? |
"DIS CONN" _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
tczielke |
Posted: Fri May 20, 2016 9:17 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
The following Unix command will tell you what processes are still connected to the queue manager when it is quiescing. I have run it on Solaris and Linux.
Code: |
for i in `dspmq -c | xargs -n1 | grep PID | cut -d'(' -f2 | cut -d')' -f1`; do ps -p $i -o user,pid,args; done |
There is basically an undocumented switch (-c) to dspmq that does this. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
MABeatty1978 |
Posted: Fri May 20, 2016 9:33 am Post subject: |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
Thank you for the help so far.
I have discovered that this location had been reIP'd. So the cluster receiver channel's conname was not configured correctly and would have been unable to connect to the full repositories or other partial repositories.
Would this result in a connection left open? |
|
Back to top |
|
 |
tczielke |
Posted: Fri May 20, 2016 1:20 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
I wouldn't think so. I think the key will be running the "dspmq -c" when your queue manager is quiescing and not coming down in a timely manner. Unless I am missing it, the "dspmq -c" is not documented in the MQ manual. For z/OS, it is documented how to display active connections when you have quiesced a queue manager. That seems to be an important operational piece that should be documented for both the mainframe and distributed administrator. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
hughson |
Posted: Sun May 22, 2016 1:15 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
tczielke wrote: |
That seems to be an important operational piece that should be documented for both the mainframe and distributed administrator. |
Put a comment in the KC to say so. _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
tczielke |
Posted: Sun May 22, 2016 4:09 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
I will plan on going the SR route to mention it to IBM. Mentioning it through the KC feels too much like a black hole, to me. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
Vitor |
Posted: Mon May 23, 2016 5:08 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
tczielke wrote: |
Mentioning it through the KC feels too much like a black hole, to me. |
It's not. I've had good responses to the feedback I've left.
YMMV of course. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
tczielke |
Posted: Thu May 26, 2016 5:22 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
Good to know that some people have had positive experiences with requesting feedback to the KC. However, I did go the PMR route and the answer I got back is "dspmq -c" is not supported and could be removed in future MQ code releases. So I guess "dspmq -c" needs to fall under the surreptitious section of the distributed tool kit like amqrfdm, amqrdbgm, etc. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
|