ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexGeneral IBM MQ SupportQueue Manager startup problems

Post new topicReply to topic
Queue Manager startup problems View previous topic :: View next topic
Author Message
steve_baldwin
PostPosted: Mon Oct 01, 2001 2:49 am Post subject: Reply with quote

Novice

Joined: 30 Sep 2001
Posts: 11
Location: Melbourne, Australia

This is a problem we get from time to time, and I have no idea how to solve it.
On issuing the strmqm {QM Name}, we get the error saying it couldn't start because the following process id's are still connected. Can't remember the exact text, but I'm sure you know the one. Anyway, on reviewing those processes, they have nothing whatsoever to do with MQ. For example, they may be a telnet daemon, a samba daemon, an Oracle process, just about anything.

My guess is that MQ's shared memory has got corrupted somehow, and whatever crap is in there, just happens to correspond to current processes. Our only way around this is to actually terminate those processes, or remove all shmem and semaphores owned by mqm.

Any clues ?
Back to top
View user's profile Send private message Send e-mail
bduncan
PostPosted: Mon Oct 01, 2001 9:19 am Post subject: Reply with quote

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

Okay, I've never answered this problem to my satisfaction, but I think I have a good idea of what is going on here. Basically MQSeries seems to keep an internal list of process IDs, corresponding to all processes that it spawns when you start the queue manager and during runtime. Anytime MQSeries ends one of these processes, that PID is removed from the list. During an unclean shutdown of MQSeries (endmqm -p or box losing power) sometimes when the queue manager is restarted, it doesn't seem to realize that it died. It then looks at the list of PIDs from it's previous execution and realizes that all those processes are dead, or belong to some other program (like telnetd). That's why it won't let you start up. Basically getting around this can be pretty tricky. Sometimes it will let you issue an endmqm command (even though the queue manager isn't actually running) and it will clear the PID list. Other times, endmqm won't work, and I've resorted to rebooting the box additional times (which sometimes seems to work) You also might want to use the commands applicable to your particular UNIX blend to see if the OS considers any semaphores or file handles to be held by MQSeries (even though it isn't running) and manually remove the hold on those resources if necessary.
If you can give me some more specifics about your particular setup, I might be able to lend a better hand...


_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator
Back to top
View user's profile Send private message Visit poster's website AIM Address
steve_baldwin
PostPosted: Mon Oct 01, 2001 2:00 pm Post subject: Reply with quote

Novice

Joined: 30 Sep 2001
Posts: 11
Location: Melbourne, Australia

One of the things that would be helpful would be if we can control the ownership of shmem segs and semaphores. On the box this happens most frequently on, we have 3 or 4 qmgrs running concurrently, and we make sure we have a dedicated unix user id that both starts the qmgr and the inetd spawned process. So, on the unix process side, we can quite happily identify which processes belong to each qmgr. However, *all* IPC resources are owned by mqm, and I don't know how to control this. If we were able to control this, we would be able to identify which IPC resources 'belong' to a qmgr, and remove only those ones when the qmgr gets into this state.
We are also plagued by the problem of the qmgr 'hanging' that I've seen described on this board. Our symptom is that the qmgr stops responding completely - including to any variant of endmqm, so the only solution is to kill the 'amq...' processes. My guess is that this is caused by one of the internal qmgr processes waiting on a semaphore that was not 'incremented' by the last process holding it. Yesterday it happened at least 10 time - always on the same qmgr. Wierd and frustrating in the extreme. We scheduled a reboot last night, so hopefully this has at least left us with a usable system today.
Back to top
View user's profile Send private message Send e-mail
bduncan
PostPosted: Tue Oct 02, 2001 9:04 am Post subject: Reply with quote

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

One of the common reasons that these hang-ups can occur is when the partition hosting /var/mqm/ gets filled up. When this occurs the QMGR basically becomes non-responsive although it will still let you go into the console via runmqsc; it's just that any command you issue will hang. 10 times in one day sounds highly suspect. This would happen to me maybe once every few months and we were hammering our UNIX boxes, plus we had about 30 of them... Have you looked at the /var/mqm/qmgrs/QMGRNAME/errors/ log files to see what the queue manager thinks is happening when it freezes up??


_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator
Back to top
View user's profile Send private message Visit poster's website AIM Address
steve_baldwin
PostPosted: Tue Oct 02, 2001 1:51 pm Post subject: Reply with quote

Novice

Joined: 30 Sep 2001
Posts: 11
Location: Melbourne, Australia

Here's a snippet of one of the many FDC files that were produced a couple of days ago. Interestingly (?) since the reboot, things seem to have quietened down considerably.

Have I included enough info to be useful ? On looking at a lot of the FDC files, it seems that most of them show the last function as xcsFFST. Don't know if thats significant. Also, the date/time shown is nothing like the actual date/time. Wierd.

Code:
+-----------------------------------------------------------------------------+
|                                                                             |
| MQSeries First Failure Symptom Report                                       |
| =====================================                                       |
|                                                                             |
| Date/Time         :- Friday February 09 07:27:01 EDT 2001                   |
| Host Name         :- hp-k580a (HP-UX B.11.00)                               |
| PIDS              :- 5765B74                                                |
| LVLS              :- 510                                                    |
| Product Long Name :- MQSeries for HP-UX                                     |
| Vendor            :- IBM                                                    |
| Probe Id          :- XC027042                                               |
| Application Name  :- MQM                                                    |
| Component         :- xcsRequestMutexSem                                     |
| Build Date        :- Jan  5 2001                                            |
| UserID            :- 00000504 (mqm)                                         |
| Program Name      :- amqzxma0_nd                                            |
| Process           :- 00001683                                               |
| Thread            :- 00000001                                               |
| QueueManager      :- HUB!QMDEV                                              |
| Major Errorcode   :- xecL_W_SEM_OWNER_DIED                                  |
| Minor Errorcode   :- OK                                                     |
| Probe Type        :- INCORROUT                                              |
| Probe Severity    :- 3                                                      |
| Probe Description :- AMQ6125: An internal MQSeries error has occurred.      |
|                                                                             |
+-----------------------------------------------------------------------------+

MQM Function Stack
zxcStopAgents
zxcProcessChildren
zxcCleanupAgent
xcsFreeMemBlock
xstFreeMemBlock
xstFreeBlockFromSharedMemSet
xstFreeBlockInExtent
xcsRequestMutexSem
xcsFFST



[ This Message was edited by: steve_baldwin on 2001-10-02 14:57 ]
Back to top
View user's profile Send private message Send e-mail
bower5932
PostPosted: Wed Oct 03, 2001 8:30 am Post subject: Reply with quote

Jedi Knight

Joined: 27 Aug 2001
Posts: 3023
Location: Dallas, TX, USA

I checked your FDC against our internal problem database, and I found the following information:

The reason cust is seeing this is the semaphore is owned (RequestMutexSem), but the owner has died (the Execution Controller in this case), and xecL_W_SEM_OWNER_DIED is returned.

This problem was solved after cust increased the system soft limit for the number of file descriptors from 64(default value) to 1024 by editing /etc/system file. Cust referred to "MQSeries for Sun Solaris Quick Beginnings Version 5.1" Chapter 3. Installing the MQSeries for Sun Solaris Server Kernel Configuration Notes: Sun Solaris has a low default system soft limit for the number of file descriptors. When running a multi-threaded process, you may reach the soft limit for file descriptors. This will give you the MQSeries reason code MQRC_UNEXPECTED_ERROR (2195), and an MQSeries FFST file. To avoid this problem you can increase the system soft limit for the number of file descriptors. To do this: Edit the /etc/system file and change the value of the system soft limit to match the system hard limit (1024).
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address Yahoo Messenger
Manishkj
PostPosted: Mon Nov 12, 2001 4:36 am Post subject: Reply with quote

Newbie

Joined: 11 Nov 2001
Posts: 9

We work around this problem by cleaning of the following Q-manager directories. After clean up the q-manager started successfully without problem.

#!/bin/ksh

TOP="/var/mqm/qmgrs/QM!TEST"

for file in /isem/* /esem/* /msem/* /shmem/* /startprm/* /@ipcc/isem/*
/@ipcc/esem/* /@ipcc/ssem//socket/* /@ipcc/shmem/*
do
/usr/bin/rm $TOP$file
done

exit 0
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Page 1 of 1

MQSeries.net Forum IndexGeneral IBM MQ SupportQueue Manager startup problems
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.