|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Shared Memory Problems |
« View previous topic :: View next topic » |
Author |
Message
|
fordcam |
Posted: Fri May 30, 2003 12:03 pm Post subject: Shared Memory Problems |
|
|
 Apprentice
Joined: 28 Mar 2002 Posts: 35 Location: MGIC
|
We've got a large Solaris machine running MQSeries 5.2, CSD1. It runs a lot of homegrown applications that are mostly written in Perl. This machine reboots every morning, and is generally trouble free. But for the last week, MQSeries has been dying around 22:00 or so each night. Connect attempts get a 2059, and the queue manager doesn't respond to runmqsc or endmqm commands.
I get a huge amount of FDCs - 450 were created last time. This is partly because no one caught that the queue manager was unusable, and an FDC was created every time an app tried to connect, right up until reboot time.
This makes it difficult to find the root cause of the problem. But I'm seeing a lot of lines like this:
--} xstAllocBlockInExtent rc=xecsS_E_NO_MEM
so I think I'm exhausting shared memory.
We use really large settings for the kernal shared memory parameters, much bigger than the manual's minimum requirements. And we are at our busiest during the day. Yet the lockups occur at night, when it's not as busy, so I suspect that there's a leak of some sort. There haven't been any noteworthy application changes lately, so it could just be volume related. Perhaps we were barely making it to reboot time in the past and I didn't know it.
We had planned to go to 5.3 this summer. I can accellerate that, but I need to be able to keep this queue manager alive for 24 hours at a time until then. Any suggestions as to how I can do this? I would even settle for
being able to anticipate the problem, and recycle 'just in time'. It's disruptive, but better than what we have now. |
|
Back to top |
|
 |
tillywern |
Posted: Mon Jun 16, 2003 1:22 pm Post subject: ipcs is you best friend. |
|
|
 Centurion
Joined: 28 Jan 2003 Posts: 109 Location: Colorado
|
Well MQ is really bad at cleaning up shared memory on Solaris. At least that is what I have seen. I can't say that I have seen your specific problem but I can say I have seen systems run out of shared memory, sem, and the like.
you can run ipcs to get a list of the ipc elements held by the system. I generally run:
ipcs |grep mqm
It should give you a pretty good list.
You may have to track the number of items in thie ipcs output for an general trend upwards. If you do this for a while you will probably understand how far you can go before MQ starts to crap out.
It is safe to say that if all queue managers are shut down there should be no entries in ipcs for mqm. Often after shutting down a queue manger you will find stuff out here.
you can use ipcrm to remove the items left. It may requrie root access to remove all of them. If so coordinate with your sysadm.
Since ipcs resources are used for inter process communication it is safe to assume that you have a lot of mq processes running. Check your process table for channel recievers that have been in the table for too long. the channel reciever processes get created in profuse amounts and are known for how long they can stay around.
Based on how your channels are managed, look for long disconnect intervals, channels may be lingering for a long time after they are used. This is especially true if you are using server connector channels. These processes could be holding resources that the system needs.
I used to run a script from cron that would go and look for channel reciever processes and remove them if they were really old.. Define really old in terms of what is appropriate for how long a transaction is in your enterprise.
I would start by writing a script that monitored the mqseries specific ipc elements on the machine and looked for the createion of FDC files with respect to time. This will at least give you an idea if what is occuring and where. From there you should be able to see either that you are maintining a constant level if ipc objects or if they are steadly increasing. I would also pull a copy of all processes in the process table that are associated with mqm. This will also show you if you have a steadly increasing number of processes that aren't going away.
I hoep this starts you on your way. |
|
Back to top |
|
 |
gperera |
Posted: Thu Jul 10, 2003 6:37 am Post subject: Solaris Kernel Parameters for MQSeries |
|
|
Newbie
Joined: 30 May 2001 Posts: 8 Location: Minneapolis, MN
|
Have you tuned your Solaris Kernel per the Quick Beginnings manual? We received this from Level 2 support about 2 yrs ago b/c at the time the manual was incomplete/inaccurate.
Solaris Kernel Parameters for MQSeries
MQSeries makes extensive use of IPC (Inter-Process Communication) resources, including shared memory, semaphores, and message queues (the IPC kind). Many Solaris systems will require some adjustment of the kernel parameters which govern these resources in order to able to run MQSeries comfortably, or to support heavily-used MQSeries installations. Indications that MQSeries lacks enough IPC resources may be an inability to start MQSeries, or difficulty in running many MQSeries programs concurrently. Furthermore, MQSeries may generate FDC files to /var/mqm/errors which contain error messages from IPC-related functions like semget, shmget, or shmat.
In order to make more IPC resources available to MQSeries, it is necessary to modify the kernel parameters on your machine using facilities like configure and idtune. Use the values given in this note in preference over those listed in the MQSeries Quick Beginnings for Solaris book. In cases where this note mentions new parameters, or overlooks some listed in the Quick Beginnings book, again give preference to this note. For more information on modifying your kernel, refer to your Solaris documentation or contact Solaris support.
We strongly urge you to save your current kernel configuration before trying to make any changes. When you make changes, realise that other programs (databases, for example) which make much use of IPC resources may force you to modify these parameters so that both MQSeries and those programs will run. The values msgmax, msgmnb, msgssz, semaem, semume, semvmx, shmmax, and shmseg should not in general require augmentation if you are running databases or other IPC-intensive programs. The values msgmap, msgmni, msgseg, msgtql, semmap, semmni, semmns, semmnu, and shmmni may require augmentation depending on the other programs running on the system. Refer to the meaning of each parameter listed below and other vendors' instructions to help you with that determination.
In general, the values that follow are only policing values. In other words, they can usually be over-allocated without causing harm to your system. This means that if your existing programs are not already running up against the limits you have specified, they will not use more kernel resources after modifying your kernel parameters.
==IPC Message Queue Parameters ==================================================
mesg 1 This should not be changed.
msgmap 1026 This is the number of entries in the kernel's message
map
table. This value should equal msgtql+2, and is should
always be less than msgseg. A value roughly half of msgseg
should be good.
msgmax 4096 This is the maximum size of a single message in bytes.
msgmnb 4096 This is the maximum number of bytes that all the
messages on
a single message queue can occupy.
msgmni 50 This is the maximum number of message queues allowed on
the
system at any time.
msgseg 2048 This is the number of memory segments allocated by the
kernel at system startup to hold messages. Each system will
have a limit on the total memory allocated (msgseg*msgssz),
often 128KB.
msgssz 8 This is the size in bytes of the memory segments used
for
storing messages. Valid values must be multiples of 4.
msgtql 1024 This is the number of system messages headers which the
kernel can store, which is effectively the maximum number of
unread messages at any time.
==IPC Semaphore Parameters ======================================================
sema 1 This should not be changed.
semaem 16384 This is the maximum adjust-on-exit value for a
semaphore.
It can be set to 32767 if necessary, but MQSeries does not
require this.
semmap 1026 This is the size of the kernel's map of semaphore sets.
This value should equal semmni+2.
semmni 1024 This is the maximum number of semaphore sets that can
exist
on the system at any time.
semmns 32768 This is the maximum number of semaphores in the system.
A
value of 16384 will generally work for a small MQSeries
installation, but setting it to 32768 is advisable for
larger systems.
semmnu 2048 This is the number of semaphore undo structures
allocated
by the system.
semmsl 128 This is the maximum number of semaphores per semaphore
set.
semopm 128 This is the maximum number of semaphore operations that can
be done by one semop() call. If this is set to semmsl,
one semop() call can operate on every semaphore in a
semaphore set, although MQSeries does not require this.
semume 256 This is the number of semaphore undo entries for each
process.
semvmx 32767 This is the maximum value that a semaphore can have.
==IPC Shared Memory Parameters ==================================================
shmem 1 This should not be changed.
shmmax 4194304 This is the maximum size in bytes of a shared memory
segment.
shmmni 1024 This is the maximum number of shared memory segments
that
can exist on the system at any time.
shmseg 1024 This is the maximum number of shared memory segments
that a
single process can have at any time. It should always be
less than or equal to shmmni.
==Miscellaneous Parameters ======================================================
maxusers 32 This controls the number of users which can log in to
the
system. More importantly, it controls other system values
which limit the number of processes that can run at once.
Rather than changing maxusers, we would recommend that you alter the nproc and maxuprc values as follows:
nproc: The maximum number of processes on the system
1 for each non-MQSeries process on the system PLUS
3 for each MQSeries queue manager (strmqm) PLUS
2 for each MQSeries receiver or svrconn channel PLUS
1 for each MQSeries sender channel PLUS
1 for each other MQSeries process (runmqtrm, etc.)
maxuprc: The maximum number of processes for a single user
1 for each non-MQSeries process run by 'mqm' PLUS
3 for each MQSeries queue manager (strmqm) PLUS
2 for each MQSeries receiver or svrconn channel PLUS
1 for each MQSeries sender channel PLUS
1 for each other MQSeries process (runmqtrm, etc.)
Users of Sun Solaris 2.5.1 or better may wish to verify that they are not in fact using more than 25% of their kernel resources for semaphore structures. In order to calculate this in bytes, use the formula given below. Also, if you are letting the kernel determine nproc for you, you can find this value by typing 'sysdef | grep v_proc':
kernel_memory = semmns * 16 +
nproc * 16 +
semmni * 92 +
semmnu * ((semume + 1) * 16) * 4
Solaris 2.5.1 users must also be certain that they are not using more than 25% of their kernel resources for shared memory structures. In order to calculate this in bytes, use the formula given below:
kernel_memory = shmmni * 120
Of course, simply calculating the bytes needed for shared memory and semaphore structures is not terribly useful if you don't know what the overall kernel resources are. Kernel memory is limited by your kernel architecture as well as by your available RAM. Type 'uname -m' to see what your kernel architecture is. The maximum kernel memory that common Sun architectures can use today is given below:
Kernel Resources Machines
====== ========= ===============================================
sun4m 256 MB ------
sun4d 576 MB SS1000, SC2000
sun4u 4 GB UltraSPARC |
|
Back to top |
|
 |
Michael Dag |
Posted: Thu Jul 10, 2003 8:02 am Post subject: |
|
|
 Jedi Knight
Joined: 13 Jun 2002 Posts: 2607 Location: The Netherlands (Amsterdam)
|
Also check the Probe Id in the FDC files and see if this is a known issue on the IBM support website the current CSD level of 5.2 is 6. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|