Author |
Message
|
garasan |
Posted: Tue May 12, 2009 12:52 am Post subject: Qmgr in trouble |
|
|
 Apprentice
Joined: 22 Jul 2008 Posts: 42 Location: Antwerp, Belgium
|
Hi,
Yesterday we ran into some problems with one off our qmgrs.
First some info about this qmgr:
Environment: Production
Version: MQ 6.0.2-5
OS: Sles 10
Past weekend we made some changes to 2 qmgrs.
We upgraded from 6.0.2-4 to 6.0.2-5 and we changed the Logbufferpages to 4096.
Since then only on this qmgr we see a amqzlaa0 process using a lot of cpu. The pid of this process relates to the Group PID of amqzxma0. (It is not an mqconnect from an application server or a any other external process as far as we can see.)
The pid is not found in the connection list of the qmgr.
Past midnight it caused two PRD offloads to lose connection to the qmgr.
Anybody seen this before? _________________ Regards |
|
Back to top |
|
 |
Vitor |
Posted: Tue May 12, 2009 12:57 am Post subject: Re: Qmgr in trouble |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
garasan wrote: |
we changed the Logbufferpages to 4096. |
How was this performed? Did you change qm.ini (if it's called that on Solaris) or saveqmgr & recreate queue manager?
Do I read your post correctly in that you performed this procedure on 2 queue managers, but only 1 is displaying this unexpected behaviour? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
vol |
Posted: Tue May 12, 2009 1:13 am Post subject: |
|
|
Acolyte
Joined: 01 Feb 2009 Posts: 69
|
the amqzlaa0 process only uses cpu in response to API requests from apps. determine the apps connected to the agent process, and check what they are doing. |
|
Back to top |
|
 |
garasan |
Posted: Tue May 12, 2009 1:21 am Post subject: |
|
|
 Apprentice
Joined: 22 Jul 2008 Posts: 42 Location: Antwerp, Belgium
|
Hi Vitor,
this was performed by adjusting the qm.ini.
The qmgr was stopped during the change and started after the change.
(It is a linux box (sles 10) it is running on, not Solaris)
Hi Vol,
That's the problem. I'm not able to determine which app is connected to this particular agent process as it is not appearing in the connection list of this qmgr. Performed DIS conn(*) ALL to check all connections.) _________________ Regards |
|
Back to top |
|
 |
Vitor |
Posted: Tue May 12, 2009 1:21 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
vol wrote: |
the amqzlaa0 process only uses cpu in response to API requests from apps. |
Not strictly true IIRC; I think it's tangled up in the logging / queue loading process someplace & I remember problems in early versions of v5.3 in this area.
But I would agree that an app connection is the most likely thing if the poster hadn't already discounted that. Perhaps worth another check there to be sure perhaps.
An obvious question I should have asked first time as well as "how was this done" was "circular or linear logging"? Another intersting question is does the failing queue manager sit for a while, then go crazy with amqzlaa0 (perhaps when the first app tries to connect?) or does it go mad at start up?
If it blows on first connect, this might explain why there are no apps connected when you investigate; the connecting app having got a 2059 and shut back down..... _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
garasan |
Posted: Tue May 12, 2009 1:32 am Post subject: |
|
|
 Apprentice
Joined: 22 Jul 2008 Posts: 42 Location: Antwerp, Belgium
|
Hi Vitor,
Double checked to be sure but this PID is not to be found in the connection list. :-(
There are apps connected, but they have a different PID.
The qmgr is configured to use CIRCULAR logging.
The crazy thing is that the qmgr kept on working "normaly", although slower, for other apps.
The app servers cpu's that were impacted went crazy and crashed. (With logging that indicates that the qmgr is not available)
I didn't restart the qmgr and currently it works, although cpu usage for the qmgr machine is rather high for it's current activities.
The qmgr was started on saturday and the problem started on sunday noon. _________________ Regards |
|
Back to top |
|
 |
Vitor |
Posted: Tue May 12, 2009 1:49 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
garasan wrote: |
this was performed by adjusting the qm.ini.
The qmgr was stopped during the change and started after the change. |
Well it wouldn't have worked if the queue manager was running...!
I know this unsupported dodge is used to change the number of log files quite successfully; I'm less convinced it's a good way to change any of the other logging parameters. It certainly sounds like the queue manager's losing the plot and hanging up connections (as vol alluded to) under some circumstances. I wonder if some of your apps are using non-persistent messages, and when others try to use persistent messages (which are logged) your trouble starts.
In your place, I'd be inclined to shrug, accept the dodge hasn't worked for some reason and schedule a slot to recreate the queue manager with the right logging parameters. That's probably faster and cleaner than fiddling round trying to fix it. Especially as you've no access to a PMR here.
(Well you have, but the response from first line support will include a phrase to the effect "unsupported change to queue manager configuration").
Bite the bullet, accept you got unlucky, take 20 mins out and recreate the queue manager. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
garasan |
Posted: Tue May 12, 2009 2:02 am Post subject: |
|
|
 Apprentice
Joined: 22 Jul 2008 Posts: 42 Location: Antwerp, Belgium
|
O_o, not following here.
Wouldn't I just adjust the logbufferpages to 0 (Initial size) and restart?
Wasn't also not aware that this was an unsuported change.
Some extra info is that I found a lot of AMQ9209 errors in my logs during the moment the disconnects occured.
And we indead do use a mix of persistent and non persistent messages. _________________ Regards |
|
Back to top |
|
 |
Vitor |
Posted: Tue May 12, 2009 2:08 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
garasan wrote: |
Wouldn't I just adjust the logbufferpages to 0 (Initial size) and restart? |
Perhaps, but see my comments above about fiddling about. This might fix it, but recreating the queue manager will fix it. Or allow a PMR.
garasan wrote: |
Wasn't also not aware that this was an unsuported change. |
It's a commonly used dodge to change the number of log files, but officially log parameters are fixed at queue manager creation.
garasan wrote: |
And we indead do use a mix of persistent and non persistent messages. |
I think this proves something bad has happened to the queue manager's logging process, and convinces me even more that a clean slate is your best way forward.
Your choice of course obviously.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue May 12, 2009 5:10 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I don't think there is any reason to recreate this QM. Changing LogBufferPages to 4096 after the QM has been created is not unsupported. The S.A.G. talks about needing to restart the QM after you make this change, implying the QM was already running with a different value for LogBufferPages at one time.
We ran at MQ 6.0.2.5 on SLES 10 on z/Linux for several months with LogBufferPages set to 4096 without problems, until we upgraded to 6.0.2.6 a few weeks ago for unrelated reasons.
Quote: |
"The app servers cpu's that were impacted went crazy and crashed. (With logging that indicates that the qmgr is not available)" |
Seems like an app problem to me. Just because an app gets a 2059 doesn't mean it should go nuts. Who knows what elese its doing. Maybe their code is asking your QM to work overtime. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Vitor |
Posted: Tue May 12, 2009 5:18 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
PeterPotkay wrote: |
The S.A.G. talks about needing to restart the QM after you make this change, |
Really? For Unix? Where?  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue May 12, 2009 5:19 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
|
Back to top |
|
 |
Vitor |
Posted: Tue May 12, 2009 5:28 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
PeterPotkay wrote: |
http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.amqzag.doc/fa12640_.htm
Quote: |
The value is examined when the queue manager is created or started, and might be increased or decreased at either of these times. However, a change in the value is not effective until the queue manager is restarted. |
|
You're right - I was thinking of LogFilePages! Doh!
Shame it didn't work for this guy then..... _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
garasan |
Posted: Tue May 12, 2009 6:10 am Post subject: |
|
|
 Apprentice
Joined: 22 Jul 2008 Posts: 42 Location: Antwerp, Belgium
|
Thanks all for the added info and remarks.
We are starting to think it is also an app problem rather then an MQ problem.
We are trying to find the troublemaker app, but it seems that is the difficult part. _________________ Regards |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue May 12, 2009 8:34 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
garasan wrote: |
Thanks all for the added info and remarks.
We are starting to think it is also an app problem rather then an MQ problem.
We are trying to find the troublemaker app, but it seems that is the difficult part. |
Shouldn't be that hard. Close the relevant channel for the connecting app (mode= force) and watch it go nuts...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|