ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Queue manager failure... large numbers of processes.

Post new topic  Reply to topic
 Queue manager failure... large numbers of processes. « View previous topic :: View next topic » 
Author Message
aboggis
PostPosted: Mon Sep 20, 2004 3:54 pm    Post subject: Queue manager failure... large numbers of processes. Reply with quote

Centurion

Joined: 18 Dec 2001
Posts: 105
Location: Auburn, California

I recently had a system "go down" when it ran out of file descriptors. This is a Solaris 8 system, 12 CPUs, 24Gb RAM, MQ v5.3, CSD05.

The max file descriptor is set to 8192.

We had three HUGE FFST (*.FDC) files generate, but looking into them simply pointed out that the file descriptors were exhausted.

The sysadmin copied the process table from the time of the crash... I counted 847 MQ processes. Mostly instances of amqzlaa0_nd (queue manager agent process) and amqrmppa (channel receiver process).

Has this been seen by anyone before? Is there a bug here? Does anyone know why so many of these processes would be created?
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
siliconfish
PostPosted: Mon Sep 20, 2004 5:46 pm    Post subject: Reply with quote

Master

Joined: 12 Aug 2002
Posts: 203
Location: USA

Check if the applications connecting to this queue manager are properly closing the connections.
Back to top
View user's profile Send private message
aboggis
PostPosted: Mon Sep 20, 2004 7:13 pm    Post subject: Reply with quote

Centurion

Joined: 18 Dec 2001
Posts: 105
Location: Auburn, California

Good call, but since this is now after the event, it's difficult to tell.

In general, if an application (process/thread) does not cleanly disconnect, how long is it before MQ reclaims the "leaked" resources?
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
PeterPotkay
PostPosted: Tue Sep 21, 2004 7:04 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

aboggis wrote:
In general, if an application (process/thread) does not cleanly disconnect, how long is it before MQ reclaims the "leaked" resources?


TCP KeepAlive will recognize the other size is gone. The default is 2 hours.
Alter that to a realistic value and make sure the QM is configured to use it.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
aboggis
PostPosted: Tue Sep 21, 2004 7:54 am    Post subject: Reply with quote

Centurion

Joined: 18 Dec 2001
Posts: 105
Location: Auburn, California

You mean KeepAlive at the protocol level?

I have HBINT set, but I'll check what I set for KeepAlive in the ini file.

I should also add that there are no client connections involved. The applications putting/getting messages run on the same host as "their" local queue manager. The *ONLY* channels defined on our queue managers are the cluster sender/receiver channels.
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
PeterPotkay
PostPosted: Tue Sep 21, 2004 8:25 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

aboggis wrote:
You mean KeepAlive at the protocol level?

Yes.

aboggis wrote:

I should also add that there are no client connections involved. The applications putting/getting messages run on the same host as "their" local queue manager. The *ONLY* channels defined on our queue managers are the cluster sender/receiver channels.


Then it makes it less likely that orphaned connections caused this.

I responded to the same question on the listserve, where I said I saw the same thing on our Windows 2000 5.3 CSD04 servers, and IBM sent us 2 new dlls to patch the problem until CSD08 came out. When we saw this, the only recourse was to reboot the server, because the QM wouldn't resoppond to anything, not even a shutdown attempt. Same symptoms:hundreds of MQ proccesses.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
aboggis
PostPosted: Tue Sep 21, 2004 9:39 am    Post subject: Reply with quote

Centurion

Joined: 18 Dec 2001
Posts: 105
Location: Auburn, California

Well, I shall be putting in a call to support shortly, since this is starting to happen across multiple hosts now.

The only change with regards to MQ configs has been to reduce the values of HBINT (changed to 5), BATCHHB (to 5 also) and NPMSPEED (to NORMAL).

I set these values lower because we need to "know" about channel failures as quickly as possible (using a cluster workload exit) so that our applications can re-route messages over an available channel.

I had overlooked the KeepAlive in qm.ini.
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
PeterPotkay
PostPosted: Tue Sep 21, 2004 11:20 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Yeah, we just bounced the server the first 2 times we saw it. At that point we called IBM.

RE: HB set to 5,
The receiver will actually time-out if no data is received within twice the Heartbeat interval if the negotiated Heartbeat Interval is less than 60 seconds, or 60 seconds beyond the negotiated heartbeat interval if it is greater than or equal to 60 seconds, by default, before assuming there has been a communications failure. The RCVR will go INACTIVE.

Make sure both sides of the channel have the same #, else the larger # of the 2 is used.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
aboggis
PostPosted: Tue Sep 21, 2004 1:25 pm    Post subject: Reply with quote

Centurion

Joined: 18 Dec 2001
Posts: 105
Location: Auburn, California

Right now I have a system that has almost 500 MQ processes active... mostly instances of amqrmppa (channel receiver) and amqzlaa0 (qmgr agent). To all intents the queue manager is "unavailable" (all remote qmgrs with channels to this qmgr are "binding") and runmqsc is non-responsive.
Back to top
View user's profile Send private message AIM Address Yahoo Messenger
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Queue manager failure... large numbers of processes.
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.