ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Con not connect to QM though it is up and running

Post new topic  Reply to topic
 Con not connect to QM though it is up and running « View previous topic :: View next topic » 
Author Message
Biju
PostPosted: Fri Dec 02, 2005 4:19 am    Post subject: Con not connect to QM though it is up and running Reply with quote

Acolyte

Joined: 03 Oct 2005
Posts: 71

Hello, (I admit I didnt do much search . But I am doing it simultaniously)

I had a problem in the production a month back. The client which was trying to connect the QM couldnt do so for about 20 minutes and the problem was resolved by itself. The client application error log shows reason code 2059. I went through the AMQERROR.log of the QM for that particular day and found the following.

10/31/05 12:04:25 AM
AMQ9002: Channel program started.

EXPLANATION:
Channel program 'FRGLC01_FRGLR01.00' started.
ACTION:
None.
-------------------------------------------------------------------------------
10/31/05 03:17:40 PM
AMQ9209: Connection to host 'c4b-w2k1 (10.50.1**.***)' closed.

EXPLANATION:
An error occurred receiving data from 'c4b-w2k1 (10.50.1**.***)' over TCP/IP.
The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 2781 -------------------------------------------------------
10/31/05 03:17:40 PM
AMQ9228: The TCP/IP responder program could not be started.

EXPLANATION:
An attempt was made to start an instance of the responder program, but the
program was rejected.
ACTION:
The failure could be because either the subsystem has not been started (in this
case you should start the subsystem), or there are too many programs waiting
(in this case you should try to start the responder program later). The reason
code was 0.


It never said a channel or the QM was down or restarted. If a channel starts or stops the log reflects it as much to my knowledge.(Correct me if I am wrong please). The client uses a SCC. An FDC was generated 30 mins prior to the error occurance. I am not able to correlate things now. The FDC is as shown below. I would like to know why this FDC has been generated though it has nothing to do with the erron given by client(I guess so as there is a differene of 30 mins gap between the 2 incidents.)



**************************************************************************************
FDC Generated on 31st Oct 2005

+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Monday October 31 14:06:02 MET 2005 |
| Host Name :- C4B-SUN2 (SunOS 5. |
| PIDS :- 5724B4103 |
| LVLS :- 530.5 CSD05 |
| Product Long Name :- WebSphere MQ for Sun Solaris |
| Vendor :- IBM |
| Probe Id :- XC002002 |
| Application Name :- MQM |
| Component :- xcsBuildDumpPtr |
| Build Date :- Sep 27 2003 |
| CMVC level :- p530-05-L030926 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000100 (mqm) |
| Program Name :- runmqlsr_nd |
| Process :- 00018936 |
| Thread :- 00000001 |
| QueueManager :- QM!QMFRGLC01 |
| Major Errorcode :- xecS_E_NONE |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6037 |
| Probe Severity :- 2 |
| Probe Description :- WebSphere MQ was unable to open a message catalog to |
| display an error message for message id hexadecimal %6, with inserts %1, |
| %2, %3, %4, and %5. |
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+


Any help would be gratefully apreciated.
Regards
Bijish


Last edited by Biju on Fri Dec 02, 2005 5:17 am; edited 1 time in total
Back to top
View user's profile Send private message Yahoo Messenger
jefflowrey
PostPosted: Fri Dec 02, 2005 4:27 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

I'm going to guess that the client couldn't connect because there were too many open connections already... although this should have really returned a 2009 to the client.

The FDC is not really related, I think. It's merely complaining that it can't access the data it needs to turn a "raw" error message into a complete error message.

You might also check your solaris kernel parameters against the quick beginnings guide, and ensure you have enough IPC resources and etc available for the size and scope of your qm.
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
Biju
PostPosted: Sun Jan 15, 2006 10:10 pm    Post subject: Reply with quote

Acolyte

Joined: 03 Oct 2005
Posts: 71

Hello All,
Following is an FDC generated in the var/mqm/errors/ folder. I have no idea why this has been generated and there is nothing of this kind in the AMQERROR01 log file. Though it is not generated that frequently I have noticed it for a couple of times. Could any one give an explanation of what does this show? Is there something wrong happening which went un noticed? Thank you all for you time.


+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Monday January 09 01:11:45 MET 2006 |
| Host Name :- C4B-SUN2 (SunOS 5. |
| PIDS :- 5724B4103 |
| LVLS :- 530.5 CSD05 |
| Product Long Name :- WebSphere MQ for Sun Solaris |
| Vendor :- IBM |
| Probe Id :- XC002002 |
| Application Name :- MQM |
| Component :- xcsBuildDumpPtr |
| Build Date :- Sep 27 2003 |
| CMVC level :- p530-05-L030926 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000100 (mqm) |
| Program Name :- runmqlsr_nd |
| Process :- 00001441 |
| Thread :- 00000001 |
| QueueManager :- QM!QM |
| Major Errorcode :- xecS_E_NONE |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6037 |
| Probe Severity :- 2 |
| Probe Description :- WebSphere MQ was unable to open a message catalog to |
| display an error message for message id hexadecimal %6, with inserts %1, |
| %2, %3, %4, and %5.
|
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+

Jefflowrey...Any suggestions to find why does this happen?

Regards,
Bijish
Back to top
View user's profile Send private message Yahoo Messenger
mvic
PostPosted: Mon Jan 16, 2006 2:49 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

How many First Failure Symptom Reports are there in that file? If I got the filename right then you could paste the output from

Code:
grep FDCSequenceNumber /var/mqm/errors/AMQ1441.0.FDC


or even better a full summary of the FDC files:

Code:
cd /var/mqm/errors
/opt/mqm/bin/ffstsummary


My reason for asking is that this looks like a "meta" error. I mean, runmqlsr_nd is about to write a real failure report, and is first writing a failure to turn a message id into a translated message - so this is not really the problem itself.

The first FDC from Monday October 31 14:06:02 MET 2005 - is it likely to be related to your problem in production in Dec 2005 ("a month back")?

I also note that you're on 5.3 CSD05 - this is really old code now. Best to install latest CSD - it gets recommended a lot here.
Back to top
View user's profile Send private message
Biju
PostPosted: Tue Jan 17, 2006 2:05 am    Post subject: Reply with quote

Acolyte

Joined: 03 Oct 2005
Posts: 71

Hello Guys,
Sorry for a late reply.
The CCSID for the Queue Manager is 923.
Following are the FDC's generated during the last 3 months....All looks the same and in the FDC file there is no other information available other than this
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Monday January 09 01:11:45 MET 2006 |
| Host Name :- C4B-SUN2 (SunOS 5. |
| PIDS :- 5724B4103 |
| LVLS :- 530.5 CSD05 |
| Product Long Name :- WebSphere MQ for Sun Solaris |
| Vendor :- IBM |
| Probe Id :- XC002002 |
| Application Name :- MQM |
| Component :- xcsBuildDumpPtr |
| Build Date :- Sep 27 2003 |
| CMVC level :- p530-05-L030926 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000100 (mqm) |
| Program Name :- runmqlsr_nd |
| Process :- 00001441 |
| Thread :- 00000001 |
| QueueManager :- QM!QMFRGLC01 |
| Major Errorcode :- xecS_E_NONE |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6037 |
| Probe Severity :- 2 |
| Probe Description :- WebSphere MQ was unable to open a message catalog to |
| display an error message for message id hexadecimal %6, with inserts %1, |
| %2, %3, %4, and %5. |
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+

*******************************************************************************

+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Monday December 05 03:10:48 MET 2005 |
| Host Name :- C4B-SUN2 (SunOS 5. |
| PIDS :- 5724B4103 |
| LVLS :- 530.5 CSD05 |
| Product Long Name :- WebSphere MQ for Sun Solaris |
| Vendor :- IBM |
| Probe Id :- XC002002 |
| Application Name :- MQM |
| Component :- xcsBuildDumpPtr |
| Build Date :- Sep 27 2003 |
| CMVC level :- p530-05-L030926 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000100 (mqm) |
| Program Name :- runmqlsr_nd |
| Process :- 00016449 |
| Thread :- 00000001 |
| QueueManager :- QM!QMFRGLC01 |
| Major Errorcode :- xecS_E_NONE |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6037 |
| Probe Severity :- 2 |
| Probe Description :- WebSphere MQ was unable to open a message catalog to |
| display an error message for message id hexadecimal %6, with inserts %1, |
| %2, %3, %4, and %5. |
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+

*******************************************************************************

+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Monday October 31 14:06:02 MET 2005 |
| Host Name :- C4B-SUN2 (SunOS 5. |
| PIDS :- 5724B4103 |
| LVLS :- 530.5 CSD05 |
| Product Long Name :- WebSphere MQ for Sun Solaris |
| Vendor :- IBM |
| Probe Id :- XC002002 |
| Application Name :- MQM |
| Component :- xcsBuildDumpPtr |
| Build Date :- Sep 27 2003 |
| CMVC level :- p530-05-L030926 |
| Build Type :- IKAP - (Production) |
| UserID :- 00000100 (mqm) |
| Program Name :- runmqlsr_nd |
| Process :- 00018936 |
| Thread :- 00000001 |
| QueueManager :- QM!QMFRGLC01 |
| Major Errorcode :- xecS_E_NONE |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6037 |
| Probe Severity :- 2 |
| Probe Description :- WebSphere MQ was unable to open a message catalog to |
| display an error message for message id hexadecimal %6, with inserts %1, |
| %2, %3, %4, and %5. |
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+

And this is what i got by executing your code....

bash-2.03$ grep FDCSequenceNumber /var/mqm/errors/AMQ01441.0.FDC
| FDCSequenceNumber :- 0

And there are more than 200 lines as a result of ffstsummary, most of them which happened due to endmqm command...I am including the ones except those, if it is of any help.

AMQ06676.0.FDC 2005/07/11 17:18:14 endmqm 6676 1 RM487001 rriChannelTerminate rrcE_CHANNEL_TERMINATED OK
AMQ09009.0.FDC 2005/07/17 22:29:41 runmqlsr_nd 9009 1 XC002002 xcsBuildDumpPtr xecS_E_NONE OK
AMQ18760.0.FDC 2005/08/08 18:07:55 endmqm 18760 1 RM487001 rriChannelTerminate rrcE_CHANNEL_TERMINATED OK
AMQ16618.0.FDC 2005/08/21 22:09:43 runmqlsr_nd 16618 1 XC002002 xcsBuildDumpPtr xecS_E_NONE OK
AMQ20596.0.FDC 2005/09/27 00:06:41 runmqlsr_nd 20596 1 XC002002 xcsBuildDumpPtr xecS_E_NONE OK
AMQ18936.0.FDC 2005/10/31 13:06:02 runmqlsr_nd 18936 1 XC002002 xcsBuildDumpPtr xecS_E_NONE OK
AMQ16449.0.FDC 2005/12/05 02:10:48 runmqlsr_nd 16449 1 XC002002 xcsBuildDumpPtr xecS_E_NONE OK
AMQ01441.0.FDC 2006/01/09 00:11:45 runmqlsr_nd 1441 1 XC002002 xcsBuildDumpPtr xecS_E_NONE OK

Quote:
The first FDC from Monday October 31 14:06:02 MET 2005 - is it likely to be related to your problem in production in Dec 2005 ("a month back")?

No I dont think so though a same kind of FDC was generated then. There is nothing wrong with a single message and no issues reported. I am just curious to find out what did make the QM to generate the FDC.
Thank you all for your support.
Thanks & Regards,
Bijish
Back to top
View user's profile Send private message Yahoo Messenger
mvic
PostPosted: Tue Jan 17, 2006 2:55 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

Biju wrote:
The CCSID for the Queue Manager is 923.


Your problems may disappear (hopefully!) if you use a standard CCSID for Solaris - 819 I think.

The FDC reports are virtually all about codepage conversion. But in at least some cases there is another error that MQ is trying to write out (AMQ6037) at the time a codepage conversion error hits.

Code:
Program Name      :- runmqlsr_nd
...
Major Errorcode   :- xecS_E_NONE
Minor Errorcode   :- OK
Probe Type        :- MSGAMQ6037


Looking at the meaning of AMQ6037, there appears to be a possible out-of-storage error. The following is the same on a v6 and v5.3 CSD11 system:

Code:
$ mqrc AMQ6037

 536895543  0x20006037  xecS_E_NONE

MESSAGE:
WebSphere MQ was unable to obtain enough storage.

EXPLANATION:
The product is unable to obtain enough storage.  The product's error recording
routine may have been called.

ACTION:
Stop the product and restart it.  If this does not resolve the problem see if a
problem has been recorded.  If a problem has been recorded, use the standard
facilities supplied with your system to record the problem identifier, and to
save the generated output files. Contact your IBM support center.  Do not
discard these files until the problem has been resolved.


Quote:
bash-2.03$ grep FDCSequenceNumber /var/mqm/errors/AMQ01441.0.FDC
| FDCSequenceNumber :- 0


OK but I am surprised MQ doesn't continue to write out the FFST that (I think) it was beginning to write.

Quote:
And there are more than 200 lines as a result of ffstsummary, most of them which happened due to endmqm command...I am including the ones except those, if it is of any help.


Just so that it is clear what is going on, could you please run

Code:
cd /var/mqm/errors
/opt/mqm/bin/ffstsummary | tail -20


Quote:
There is nothing wrong with a single message and no issues reported. I am just curious to find out what did make the QM to generate the FDC.


If there was an out-of-memory error, it is possible (but this is pure speculation) that the channel backed out the piece of work it was doing, and (depending on what exactly that channel was) it was automatically restarted (in the case of "normal" channels) or the app reconnected and retried (in the case of an MQI channel).

>>> EDIT: I reviewed the original problem description and it seems likely these problems are from an MQI channel, and this was the direct cause of the temporary problem for your apps. <<<

This explanation begs more questions - was there really an out-of-memory condition, why did it not affect the rest of the system, why did MQ not follow through and record the out-of-memory condition properly?

OK on this last point, you could check:

Code:
find /var/mqm -type f -name AMQ\*.LOG -exec grep 6037 {} \; -print


and if there are any meaningful results from that, take a look in the relevant AMQ*.LOG file at those entries and any surrounding them.
Back to top
View user's profile Send private message
Biju
PostPosted: Tue Jan 17, 2006 5:21 am    Post subject: Reply with quote

Acolyte

Joined: 03 Oct 2005
Posts: 71

Hello Mvic,
There are no error logged in the opt/mqm/qmgr/error/AMQ*.log and in var/mqm/error/amq*.log. I dont know what else can be use ful in finding the root cause. Anyway I will keep a look at the same and try to find out the real reason. If some thing clicks your mind let me also know. thank you very much for ur replies and am waiting to hear more.
Quote:
find /var/mqm -type f -name AMQ\*.LOG -exec grep 6037 {} \; -print

this doesnt give anything.

Thanks & Regards
Bijish
Back to top
View user's profile Send private message Yahoo Messenger
mvic
PostPosted: Tue Jan 17, 2006 5:57 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

Biju wrote:
I dont know what else can be use ful in finding the root cause.


Here are some suggestions

You could use CCSID 819 to try to avoid the codepage conversion errors. But if you must use a different CCSID then there is a problem here - I suggest you talk to Support about why you see the problem, and ask if it can be fixed.

You could also monitor memory usage on the machine, to see if there is any correlation between this and the failures.

For example, run the following to collect a vmstat and ps -efl listing every minute. Hopefully then you will have some independent information at the time you next see the problem occurring:

Code:
(nohup /usr/bin/ksh -c "while /usr/bin/true ; do date ; ps -efl ; sleep 60 ; done" 2>&1) > ps.log &
(nohup /usr/bin/ksh -c "while /usr/bin/true ; do date ; vmstat 60 1 ; sleep 60 ; done" 2>&1) > vmstat.log &


Finally, I would highly recommend planning and executing the move to CSD 11. As I mentioned previously, CSD05 is very old now, and it might be that this issue has been found and fixed previously.
Back to top
View user's profile Send private message
Biju
PostPosted: Wed Jan 18, 2006 1:05 am    Post subject: Reply with quote

Acolyte

Joined: 03 Oct 2005
Posts: 71

Hi,
Thank you so much. I will do as u suggested and keep you updated about the results.
Back to top
View user's profile Send private message Yahoo Messenger
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Con not connect to QM though it is up and running
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.