Author |
Message
|
thindk00 |
Posted: Mon Sep 26, 2005 1:44 pm Post subject: Channel Status locking? |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
Hi,
We're on 5.3, CSD10 on Solaris. I think we have a problem with our channel status table for some reason (may be this is a bug in WMQ base code or our channel exits).
We have a channel that we can't stop. It's a requestor channel, we issue the stop channel command with the mode(force) and mode(terminate) options. The command never completes. While this is happening we can't issue basic commands such as "dis chs(*)" in other runmqsc sessions. If we kill the runmqsc session that has the stop chl command running, the dis chs commands work.
For some reason it looks like our channel status table is being locked.
Has anyone seen this issue or know of a way to troubleshoot/fix this?
Thanks,
Kulbir. |
|
Back to top |
|
 |
jefflowrey |
Posted: Mon Sep 26, 2005 3:19 pm Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Ask the other side to stop their sender channel, and see what happens. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
hopsala |
Posted: Mon Sep 26, 2005 6:40 pm Post subject: |
|
|
 Guardian
Joined: 24 Sep 2004 Posts: 960
|
What about:
1. Look in the error logs, anything interesting there? any FDCs?
2. How did you reach the conclusion that the problem is with the channel status table, of all things?
3. I suggest you do not assume "bug with WMQ base code" so rapidly, of course it is possible, but 99 times out of a hundred it is not the case; What you said about problems with your channel exit code is far more likely to occur. A good way to make sure is to simply remove the exit and see if the channel still hangs.
4. Have you installed latest CSDs?
5. When you say you can't stop it, you mean it's in RUNNING state? If so, are any messages going through?
6. Did this work in the past? Does it happen if you create another channel pair?
7. FYI, as jeff hinted, requester channels are better stopped from the sender side; however, your runmqsc shouldn't get hung on command.
Interesting problem you got there, let us know what's going on... |
|
Back to top |
|
 |
thindk00 |
Posted: Mon Sep 26, 2005 11:03 pm Post subject: |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
If we stop the server end of the channel the channel status on the server end goes to STOPPED but there is no impact on the Requestor end.
There are some FDC's reported but I can't see anything obvious giving it a way. Here is the header for the common problem being reported:
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Monday September 26 16:47:37 GMT 2005 |
| Host Name :- kopsauxmu02 (SunOS 5.9) |
| PIDS :- 5724B4103 |
| LVLS :- 530.10 CSD10 |
| Product Long Name :- WebSphere MQ for Sun Solaris |
| Vendor :- IBM |
| Probe Id :- XC307020 |
| Application Name :- MQM |
| Component :- xlsRequestMutex |
| Build Date :- May 13 2005 |
| CMVC level :- p530-10-L050504 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000717 (mqm) |
| Program Name :- runmqchl_nd |
| Process :- 00010396 |
| Thread :- 00000001 |
| QueueManager :- GKOP_QH1_GM1 |
| Major Errorcode :- xecL_W_SEM_OWNER_DIED |
| Minor Errorcode :- OK |
| Probe Type :- INCORROUT |
| Probe Severity :- 3 |
| Probe Description :- AMQ6125: An internal WebSphere MQ error has occurred. |
| FDCSequenceNumber :- 0 |
| |
We removed the channel exit as we thought it must be an issue with our channel exits and we still hit a problem with the channels.
We're on CSD10.
When we issue the STOP CHL command without MODE(FORCE) or MODE(TERMINATE) the channel stays in STOPPING state. If we issue MODE(FORCE) or MODE(TERMINATE) the command never returns and we can't issue DIS CHS from any other runmqsc window. We can't find the PID for the requestor channel otherwise we would kill that. We can only get around this by restarting the Queue Manager. This is beginning to happen regularly now (twice yesterday).
We've got MAXCHANNELS set to 900.
Any other ideas?
Thanks. |
|
Back to top |
|
 |
hopsala |
Posted: Tue Sep 27, 2005 12:47 am Post subject: |
|
|
 Guardian
Joined: 24 Sep 2004 Posts: 960
|
Quote: |
Major Errorcode :- xecL_W_SEM_OWNER_DIED | |
Hmm, SEM is for short for semaphore (which explains the RequestMutex), possibly you have a semaphore problem on your hands; did you setup semaphores to mq requirements before installing? If not look at the manuals and search this site on how to do so (search "seamphores sun" or something similar).
For example, I found this topic that might help you Queue Manager startup problems.
If this is not the problem, I think you should:
1. Go through all relevant FDCs slowly and carefully, try to figure out what the problem is and what's causing it; i'm afraid no one can help you with that, it's just a matter of time and patience.
2. Open a PMR with IBM.
3. Wait for other responses, maybe someone else has any other ideas... |
|
Back to top |
|
 |
hopsala |
Posted: Tue Sep 27, 2005 12:52 am Post subject: |
|
|
 Guardian
Joined: 24 Sep 2004 Posts: 960
|
thindk00 wrote: |
When we issue the STOP CHL command without MODE(FORCE) or MODE(TERMINATE) the channel stays in STOPPING state |
Oh btw, this is normal requestor behavior, which is why its usually best to stop from the sender side; better yet, don't work with requesters at all - from the ~50 sites I know, no one works with requsters, they all work with sender-receiver pairs. I recommend you do the same. (Of course others may have a different opinion of this matter.) |
|
Back to top |
|
 |
thindk00 |
Posted: Tue Sep 27, 2005 12:54 am Post subject: Number of file descriptor |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
I've seen the following reported for this type of problem:
*************
I checked your FDC against our internal problem database, and I found the following information:
The reason cust is seeing this is the semaphore is owned (RequestMutexSem), but the owner has died (the Execution Controller in this case), and xecL_W_SEM_OWNER_DIED is returned.
This problem was solved after cust increased the system soft limit for the number of file descriptors from 64(default value) to 1024 by editing /etc/system file. Cust referred to "MQSeries for Sun Solaris Quick Beginnings Version 5.1" Chapter 3. Installing the MQSeries for Sun Solaris Server Kernel Configuration Notes: Sun Solaris has a low default system soft limit for the number of file descriptors. When running a multi-threaded process, you may reach the soft limit for file descriptors. This will give you the MQSeries reason code MQRC_UNEXPECTED_ERROR (2195), and an MQSeries FFST file. To avoid this problem you can increase the system soft limit for the number of file descriptors. To do this: Edit the /etc/system file and change the value of the system soft limit to match the system hard limit (1024).
*************
We already have the file descriptor limit set to 1024 but I suspect that we may need to increase further. We have BMC monitoring this environment, but there's a lot of queue managers (about 15) and there are lots of objects per queue manager. Is this worth trying?
Thanks. |
|
Back to top |
|
 |
hopsala |
Posted: Tue Sep 27, 2005 1:11 am Post subject: |
|
|
 Guardian
Joined: 24 Sep 2004 Posts: 960
|
thindk00 wrote: |
Is this worth trying? |
Yes.
But read through the entire post, and search this site for other relevant posts, that was just what I found in a 1 min quicksearch; you may have a problem with some other kernel parameter, or something entirely diffrerent.
Read the Sun Quick Beginning manual, I won't say it again. |
|
Back to top |
|
 |
thindk00 |
Posted: Tue Sep 27, 2005 1:19 am Post subject: Kernel settings |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
Our other settings seem to be OK, we've set according to the quick beginnings. Settings are as follows:
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmseg=16384
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=16384
*
set semsys:seminfo_semmni=2048
set semsys:seminfo_semaem=16384
set semsys:seminfo_semvmx=32767
set semsys:seminfo_semmns=147456
set semsys:seminfo_semmsl=16384
set semsys:seminfo_semmnu=2048
set semsys:seminfo_semume=256
set semsys:seminfo_semmap=2000
set semsys:seminfo_semopm=750
However we will re-review to see if these need to be increased/revisited. |
|
Back to top |
|
 |
xxx |
Posted: Tue Sep 27, 2005 8:41 am Post subject: |
|
|
Centurion
Joined: 13 Oct 2003 Posts: 137
|
file descriptors , is what mentioned , may be you should check your ulimt if there are enough of them !
i.e to set them ulimited , just a try |
|
Back to top |
|
 |
|