|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Queue Manager startup problems |
« View previous topic :: View next topic » |
Author |
Message
|
steve_baldwin |
Posted: Mon Oct 01, 2001 2:49 am Post subject: |
|
|
Novice
Joined: 30 Sep 2001 Posts: 11 Location: Melbourne, Australia
|
This is a problem we get from time to time, and I have no idea how to solve it.
On issuing the strmqm {QM Name}, we get the error saying it couldn't start because the following process id's are still connected. Can't remember the exact text, but I'm sure you know the one. Anyway, on reviewing those processes, they have nothing whatsoever to do with MQ. For example, they may be a telnet daemon, a samba daemon, an Oracle process, just about anything.
My guess is that MQ's shared memory has got corrupted somehow, and whatever crap is in there, just happens to correspond to current processes. Our only way around this is to actually terminate those processes, or remove all shmem and semaphores owned by mqm.
Any clues ? |
|
Back to top |
|
 |
bduncan |
Posted: Mon Oct 01, 2001 9:19 am Post subject: |
|
|
Padawan
Joined: 11 Apr 2001 Posts: 1554 Location: Silicon Valley
|
Okay, I've never answered this problem to my satisfaction, but I think I have a good idea of what is going on here. Basically MQSeries seems to keep an internal list of process IDs, corresponding to all processes that it spawns when you start the queue manager and during runtime. Anytime MQSeries ends one of these processes, that PID is removed from the list. During an unclean shutdown of MQSeries (endmqm -p or box losing power) sometimes when the queue manager is restarted, it doesn't seem to realize that it died. It then looks at the list of PIDs from it's previous execution and realizes that all those processes are dead, or belong to some other program (like telnetd). That's why it won't let you start up. Basically getting around this can be pretty tricky. Sometimes it will let you issue an endmqm command (even though the queue manager isn't actually running) and it will clear the PID list. Other times, endmqm won't work, and I've resorted to rebooting the box additional times (which sometimes seems to work) You also might want to use the commands applicable to your particular UNIX blend to see if the OS considers any semaphores or file handles to be held by MQSeries (even though it isn't running) and manually remove the hold on those resources if necessary.
If you can give me some more specifics about your particular setup, I might be able to lend a better hand...
_________________ Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator |
|
Back to top |
|
 |
steve_baldwin |
Posted: Mon Oct 01, 2001 2:00 pm Post subject: |
|
|
Novice
Joined: 30 Sep 2001 Posts: 11 Location: Melbourne, Australia
|
One of the things that would be helpful would be if we can control the ownership of shmem segs and semaphores. On the box this happens most frequently on, we have 3 or 4 qmgrs running concurrently, and we make sure we have a dedicated unix user id that both starts the qmgr and the inetd spawned process. So, on the unix process side, we can quite happily identify which processes belong to each qmgr. However, *all* IPC resources are owned by mqm, and I don't know how to control this. If we were able to control this, we would be able to identify which IPC resources 'belong' to a qmgr, and remove only those ones when the qmgr gets into this state.
We are also plagued by the problem of the qmgr 'hanging' that I've seen described on this board. Our symptom is that the qmgr stops responding completely - including to any variant of endmqm, so the only solution is to kill the 'amq...' processes. My guess is that this is caused by one of the internal qmgr processes waiting on a semaphore that was not 'incremented' by the last process holding it. Yesterday it happened at least 10 time - always on the same qmgr. Wierd and frustrating in the extreme. We scheduled a reboot last night, so hopefully this has at least left us with a usable system today. |
|
Back to top |
|
 |
bduncan |
Posted: Tue Oct 02, 2001 9:04 am Post subject: |
|
|
Padawan
Joined: 11 Apr 2001 Posts: 1554 Location: Silicon Valley
|
One of the common reasons that these hang-ups can occur is when the partition hosting /var/mqm/ gets filled up. When this occurs the QMGR basically becomes non-responsive although it will still let you go into the console via runmqsc; it's just that any command you issue will hang. 10 times in one day sounds highly suspect. This would happen to me maybe once every few months and we were hammering our UNIX boxes, plus we had about 30 of them... Have you looked at the /var/mqm/qmgrs/QMGRNAME/errors/ log files to see what the queue manager thinks is happening when it freezes up??
_________________ Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator |
|
Back to top |
|
 |
steve_baldwin |
Posted: Tue Oct 02, 2001 1:51 pm Post subject: |
|
|
Novice
Joined: 30 Sep 2001 Posts: 11 Location: Melbourne, Australia
|
Here's a snippet of one of the many FDC files that were produced a couple of days ago. Interestingly (?) since the reboot, things seem to have quietened down considerably.
Have I included enough info to be useful ? On looking at a lot of the FDC files, it seems that most of them show the last function as xcsFFST. Don't know if thats significant. Also, the date/time shown is nothing like the actual date/time. Wierd.
Code: |
+-----------------------------------------------------------------------------+
| |
| MQSeries First Failure Symptom Report |
| ===================================== |
| |
| Date/Time :- Friday February 09 07:27:01 EDT 2001 |
| Host Name :- hp-k580a (HP-UX B.11.00) |
| PIDS :- 5765B74 |
| LVLS :- 510 |
| Product Long Name :- MQSeries for HP-UX |
| Vendor :- IBM |
| Probe Id :- XC027042 |
| Application Name :- MQM |
| Component :- xcsRequestMutexSem |
| Build Date :- Jan 5 2001 |
| UserID :- 00000504 (mqm) |
| Program Name :- amqzxma0_nd |
| Process :- 00001683 |
| Thread :- 00000001 |
| QueueManager :- HUB!QMDEV |
| Major Errorcode :- xecL_W_SEM_OWNER_DIED |
| Minor Errorcode :- OK |
| Probe Type :- INCORROUT |
| Probe Severity :- 3 |
| Probe Description :- AMQ6125: An internal MQSeries error has occurred. |
| |
+-----------------------------------------------------------------------------+
MQM Function Stack
zxcStopAgents
zxcProcessChildren
zxcCleanupAgent
xcsFreeMemBlock
xstFreeMemBlock
xstFreeBlockFromSharedMemSet
xstFreeBlockInExtent
xcsRequestMutexSem
xcsFFST
|
[ This Message was edited by: steve_baldwin on 2001-10-02 14:57 ] |
|
Back to top |
|
 |
bower5932 |
Posted: Wed Oct 03, 2001 8:30 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Aug 2001 Posts: 3023 Location: Dallas, TX, USA
|
I checked your FDC against our internal problem database, and I found the following information:
The reason cust is seeing this is the semaphore is owned (RequestMutexSem), but the owner has died (the Execution Controller in this case), and xecL_W_SEM_OWNER_DIED is returned.
This problem was solved after cust increased the system soft limit for the number of file descriptors from 64(default value) to 1024 by editing /etc/system file. Cust referred to "MQSeries for Sun Solaris Quick Beginnings Version 5.1" Chapter 3. Installing the MQSeries for Sun Solaris Server Kernel Configuration Notes: Sun Solaris has a low default system soft limit for the number of file descriptors. When running a multi-threaded process, you may reach the soft limit for file descriptors. This will give you the MQSeries reason code MQRC_UNEXPECTED_ERROR (2195), and an MQSeries FFST file. To avoid this problem you can increase the system soft limit for the number of file descriptors. To do this: Edit the /etc/system file and change the value of the system soft limit to match the system hard limit (1024). |
|
Back to top |
|
 |
Manishkj |
Posted: Mon Nov 12, 2001 4:36 am Post subject: |
|
|
Newbie
Joined: 11 Nov 2001 Posts: 9
|
We work around this problem by cleaning of the following Q-manager directories. After clean up the q-manager started successfully without problem.
#!/bin/ksh
TOP="/var/mqm/qmgrs/QM!TEST"
for file in /isem/* /esem/* /msem/* /shmem/* /startprm/* /@ipcc/isem/*
/@ipcc/esem/* /@ipcc/ssem//socket/* /@ipcc/shmem/*
do
/usr/bin/rm $TOP$file
done
exit 0 |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|