ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » A large cluster problem-MQ 5.3 in AIX

Post new topic  Reply to topic
 A large cluster problem-MQ 5.3 in AIX « View previous topic :: View next topic » 
Author Message
mqdev
PostPosted: Thu Dec 04, 2003 7:26 am    Post subject: A large cluster problem-MQ 5.3 in AIX Reply with quote

Centurion

Joined: 21 Jan 2003
Posts: 136

We are experiencing a prob in our MQ cluster and I would like to hear from others who have worked in similar situations.

Our Environment:
AIX 4.3 and 5.2 with all the requisite OS patches. The cluster consists of about 1000+ MQ v5.3 QMgrs geographically distributed. The Central App Server (CAS) hosts a QMgr which hosts about 95% of Application Cluster Queues. Each QMgr runs an app which sends a HeartBeat Msg to a Central HeartBeatServer (HBS) which is different from CAS at a frequency of 10 mins. There are other App connected to QMgrs - sending traffic to CAS. The Network runs NTP and hence the system clocks on the nodes are in sync thereby causing the HB msg traffic to be pretty much simulteneous throughout the network - except for the differential Network induced delays for each individual nodes. Distributing the App load by creating multiple App Cluster Queues is currently NOT an option - they MUST reside on a single host for the time being. The Listener process are spawned via inetd for the MQ cluster (runmqlsr is ruled out - pls see below)

Problem: Under heavy load conditions (only), the QMgr on CAS becomes unresponsive. Within a matter of few seconds, 5000+ amqcrsta processes fire up pretty much consuming the box - with the OS runqueue showing thousands of jobs (presumably amqcrsta processes) waiting to go. The HBS which is a less powerful box then CAS also exhitbits similar behaviour - only the HeartBeat Cluster Queue is full when this happens (which may or not be playing a role in precipitating the crisis...). As a trial we have applied CSD05 on HBS without any success. Based on the posts on this site, we played around tuning DISCINT, HBINT, TCP KeepAlive, AdoptNewMCA parms - all in vain. Probably the solution is to judiciously set the values of one or more of DISCINT, HBINT and other MQ parameters.

Moving to runmqlsr instead of spawning Listerenrs from inetd is not being considered as we still have an upper limit on number of threads that a single process can spawn and if the runmqlsr process needs to be restarted (in the event of run away threads this would be only option), the whole cluster communication is disrupted.

Any suggestions?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » Clustering » A large cluster problem-MQ 5.3 in AIX
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.