Author |
Message
|
nosnhoj |
Posted: Wed Sep 07, 2005 4:47 pm Post subject: < Solved (sort of) >2059 and lots of FDC's |
|
|
Apprentice
Joined: 07 Sep 2005 Posts: 40 Location: Markham On.
|
Strange problem... Client works fine for a while, many connects and disconnects - then they get a 2059 error, and then all hell breaks loose. The processes in netstat for the port jump like crazy, and FDC errors generated every minute or so (usually all refering to channel terminated.. RM something in the probe, but according to the IBM site it is 'normal')
Network people say they see the server end the connection (FIN) but the queue manager and listener always remain up and available - MQJexplorer stays connected until we run out of channels.
The only change i can think of is we recently swithced from inetd to runmqlsr (MQ 5.3 CSD6 on hp ux 11). This has worked fine for a year before this.
Any ideas? Should we switch back to inetd? Why the 2059s?
Last edited by nosnhoj on Thu Sep 08, 2005 9:40 am; edited 1 time in total |
|
Back to top |
|
 |
hopsala |
Posted: Wed Sep 07, 2005 5:08 pm Post subject: |
|
|
 Guardian
Joined: 24 Sep 2004 Posts: 960
|
First of all, there is no need to switch back to inetd, runmqlsr should work just fine; if this is a production system, however, maybe i'd switch back, buying time to investigate without users shouting over my shoulders.
Concerning the actual error, you didn't supply us with enough info to help you, please post /errors/ and /qmgrs/errors AMQERR files, and the relevant FDC sections. (before posting, do a little research - try and see what the first original error is, not the other spawned error msgs; Mind times and dates in doing so.)
Of the top of my head, i'd suggest restarting the server, and installing CSD11 (I remember there were some fixes concerning runmqlsr on unix platforms)
Btw, you stated this prob simply "happens" - does it ever stop "happening"? i.e is it that after a while of receiving those 2059 client channels return to work and everything goes back to normal? |
|
Back to top |
|
 |
nosnhoj |
Posted: Thu Sep 08, 2005 5:51 am Post subject: |
|
|
Apprentice
Joined: 07 Sep 2005 Posts: 40 Location: Markham On.
|
Seeing this a lot:
09/08/05 07:33:38
AMQ9604: Channel 'prod1.prod2' terminated unexpectedly
EXPLANATION:
The process or thread executing channel 'prod1.prod2' is no longer running.
The check process system call returned 545284357 for process 24520.
ACTION:
No immediate action is required because the channel entry has been removed from
the list of running channels. Inform the system administrator who should
examine the operating system procedures to determine why the channel process
has terminated.
And the only way to 'get the system back' is kill all processes (endmqm does not work) |
|
Back to top |
|
 |
jefflowrey |
Posted: Thu Sep 08, 2005 6:01 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Stop all processes.
Clear all FDCs.
I think you can clear AMQERR* files as well. Try leaving them there, but emtpy first.
Restart the qmgr.
Make note of the first time the problem shows up, and look at the *first* entry in the log and the chronologically first FDC. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Nigelg |
Posted: Thu Sep 08, 2005 6:29 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
The AMQ9604 error log msgs, and the RM487001 FDCs, are reporting the same thing. The cause is that some process, typically runmqsc or amqpcsea, reading the internal channel status table has found that the process ID for a channel which has a status of RUNNING is not longer present. The FDC is then produced and the msg output to the error logs.
The root cause is that the process running these channels (amqrmppa) has crashed on one of its threads, bringing down the rest of the channels running on the other threads.
Are there any FDCs other than RM487001, e.g. XC130003, from a channel process?
Is it possible that some action external to WMQ is being done to kill the channel processes? _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
nosnhoj |
Posted: Thu Sep 08, 2005 7:00 am Post subject: |
|
|
Apprentice
Joined: 07 Sep 2005 Posts: 40 Location: Markham On.
|
Here is the first FDC generated:
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Thursday September 08 10:52:51 EDT 2005 |
| Host Name :- dc2c5s (HP-UX B.11.11) |
| PIDS :- 5724B4102 |
| LVLS :- 530.6 CSD06 |
| Product Long Name :- WebSphere MQ for HP-UX |
| Vendor :- IBM |
| Probe Id :- XY076002 |
| Application Name :- MQM |
| Component :- xllRecoverSocketEvent |
| Build Date :- Feb 11 2004 |
| CMVC level :- p530-06-L040211 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000201 (mqm) |
| Program Name :- amqrmppa_nd |
| Process :- 00028464 |
| Thread :- 00000031 |
| QueueManager :- QMGR!MW2 |
| Major Errorcode :- xecF_E_UNEXPECTED_SYSTEM_RC |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6119 |
| Probe Severity :- 2 |
| Probe Description :- WebSphere MQ was unable to open a message catalog to |
| display an error message for message id hexadecimal %6, with inserts %1, |
| %2, %3, %4, and %5. |
| FDCSequenceNumber :- 0 |
| Arith1 :- 24 18 |
| Comment1 :- '24 - Too many open files' from socket. |
| |
| |
+-----------------------------------------------------------------------------+
This is the second:
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Thursday September 08 10:55:37 EDT 2005 |
| Host Name :- dc2c5s (HP-UX B.11.11) |
| PIDS :- 5724B4102 |
| LVLS :- 530.6 CSD06 |
| Product Long Name :- WebSphere MQ for HP-UX |
| Vendor :- IBM |
| Probe Id :- XC015001 |
| Application Name :- MQM |
| Component :- xcsFreeQuickCell |
| Build Date :- Feb 11 2004 |
| CMVC level :- p530-06-L040211 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000201 (mqm) |
| Program Name :- amqrmppa_nd |
| Process :- 00007731 |
| Thread :- 00000010 |
| QueueManager :- QMGR!MW2 |
| Major Errorcode :- xecS_E_BLOCK_ALREADY_FREE |
| Minor Errorcode :- OK |
| Probe Type :- INCORROUT |
| Probe Severity :- 2 |
| Probe Description :- AMQ6125: An internal WebSphere MQ error has occurred. |
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+
Then they just continue..... the error in the qmgr error log appears to be a 2059. Listeners are running, and I can connect remotely to the queue manager... they get 2059s but still we get some connections - very strange |
|
Back to top |
|
 |
nosnhoj |
Posted: Thu Sep 08, 2005 7:43 am Post subject: |
|
|
Apprentice
Joined: 07 Sep 2005 Posts: 40 Location: Markham On.
|
Just switched back to inetd.conf instead of runmqlsr and all seems to be ok - been 7 minutes without a failure so far!!!! |
|
Back to top |
|
 |
Nigelg |
Posted: Thu Sep 08, 2005 7:48 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
This is in the first FDC...
Quote: |
Comment1 :- '24 - Too many open files' from socket. |
Looks like some system parameter is too low, perhaps number of open files allowed per process?
This affects runmqlsr/amqrmppa because lots of channels run as threads in the same process, but has no effect on inetd because each channel runs in a spearate process. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
nosnhoj |
Posted: Thu Sep 08, 2005 9:41 am Post subject: |
|
|
Apprentice
Joined: 07 Sep 2005 Posts: 40 Location: Markham On.
|
Looks like it was runmqlsr... switching back to inetd seems to have solved it. Now just to figure out what happened and why.... will update this if anyone wants to know.
Thanks for the help! |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Sep 08, 2005 3:04 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
jefflowrey wrote: |
I think you can clear AMQERR* files as well. Try leaving them there, but emtpy first.
|
Yes, you can do this. If MQ doesn't find an AMQERR* file, it creates it. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Cliff |
Posted: Fri Sep 09, 2005 5:01 am Post subject: |
|
|
Centurion
Joined: 27 Jun 2001 Posts: 145 Location: Wiltshire
|
Assuming HP-UX works like Solaris, your problem could be the soft limit for file descriptors being reached. Runmqlsr is a fully multi-threaded program. From the Solaris Quick Beginnings:
When running a multi-threaded process, you might reach the soft limit for file descriptors. This gives you the WebSphere MQ reason code MQRC_UNEXPECTED_ERROR (2195) and, if there are enough file descriptors, a WebSphere MQ FFST(TM) file.
So it's probably worth checking the equivalent value on HP-UX. Just a shot in the dark .....
Good luck -
Cliff |
|
Back to top |
|
 |
|