ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Debugging SVRCONN

Post new topic  Reply to topic Goto page 1, 2  Next
 Debugging SVRCONN « View previous topic :: View next topic » 
Author Message
jshailes
PostPosted: Mon Oct 24, 2011 9:04 am    Post subject: Debugging SVRCONN Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

I have created a server-connection channel to receive some data from our clients and am having some difficulty sustaining a connection - I've enabled events on the channel and every 8-30 minutes the channel is stopped and started again. The duration between stop and start varies from a few seconds to occasionally a few minutes. If the latter occurs I lose messages because they have a lifetime of 60 seconds.

Does anyone know how I can go about investigating this issue further?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Mon Oct 24, 2011 9:09 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

The first issue I see is that the messages have expiry set to 60 seconds. Is this intentional? This means that after 60 seconds, the message is no longer consumable.

Does the creating app end itself after 60 seconds (of inactivity)?

Is the app queue triggered? Is the consuming app coming to life within the 60 seconds?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
jshailes
PostPosted: Mon Oct 24, 2011 9:32 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

The company generating the messages have set an expiry on the messages of 60 seconds. The message then comes over a server connection channel, though MQ IPT and onto a queue hosted on our MQ server.

I have a java client app that listens to the queue and persists any message received. As far as I am aware this client app never fails.

I have made enquiries to ask the company generating the messages to increase the expiry but unfortunately they wouldn't change it.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Mon Oct 24, 2011 9:37 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

every time the channel stops, it should tell you why - either in the mq or client log or in the event itself.

I'd start by your network admins for failing to configure the firewall correctly.
Back to top
View user's profile Send private message
jshailes
PostPosted: Mon Oct 24, 2011 9:37 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

It might be worth mentioning that the message stream should be continuous - approximately 60 messages per second 24/7
Back to top
View user's profile Send private message
jshailes
PostPosted: Mon Oct 24, 2011 9:45 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

Do you know where the MQ logs might be? I've looked high and low to try to find some but to no avail - To identify the failure was the server connection channel I had to turn on the system events queues which simply says 'Channel stopping' or 'Channel starting'.

There's nothing coming out in the JMS client because that's not where the failure occurs - the client connection is robust, it's the server connection channel managed by MQ where the problem is occuring.

After I identified the problem was the server conn channel dropping my first point of call was the firewall. It's not sat behind a hardware firewall and I've now disabled all security on the server itself.
Back to top
View user's profile Send private message
jshailes
PostPosted: Mon Oct 24, 2011 9:56 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

Sorry that misleadin.. I have looked in /var/mqm/errors and /var/mqm/qmgrs/<qmname>/errors but there is nothing in there relating to the channel dropping. I've also come across various other files, e.g. S000001.log, but they appear to be binary files and therefore I can't read them. Are there any others or a way of turning on additional logging?
Back to top
View user's profile Send private message
mqjeff
PostPosted: Mon Oct 24, 2011 10:02 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

There should be an errors directory with AMQERR log files on the client install location as well.

The network might still be timing out connections that it has decided are "inactive". Go back to the network team and try again.
Back to top
View user's profile Send private message
jshailes
PostPosted: Mon Oct 24, 2011 10:15 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

Ok, found them. There are errors relating to the channel stopping:

Code:

24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9209: Connection to host 'localhost (127.0.0.1)' closed.

EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP.  The
connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3373 -------------------------------------------------------
24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'NRPEB023.ACT01' ended abnormally.
ACTION:
Look at previous error messages for channel program 'NRPEB023.ACT01' in the
error files to determine the cause of the failure.

----- amqkacca.c : 1870 -------------------------------------------------------
24/10/11 19:00:48 - Process(15483.871) User(mqm) Program(amqrmppa)
AMQ9002: Channel 'NRPEB023.ACT01' is starting.

EXPLANATION:
Channel 'NRPEB023.ACT01' is starting.
ACTION:
None.


Any clue as to why the conn to localhost might be dropped?
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Mon Oct 24, 2011 10:29 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

jshailes wrote:
Ok, found them. There are errors relating to the channel stopping:
Any clue as to why the conn to localhost might be dropped?


Is there some sniffer program going against the host and port? MQ does not like those...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
gbaddeley
PostPosted: Mon Oct 24, 2011 3:25 pm    Post subject: Reply with quote

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

jshailes wrote:
Ok, found them. There are errors relating to the channel stopping:

Code:

24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9209: Connection to host 'localhost (127.0.0.1)' closed.

EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP.  The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3373 -------------------------------------------------------
24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'NRPEB023.ACT01' ended abnormally.
ACTION:
Look at previous error messages for channel program 'NRPEB023.ACT01' in the error files to determine the cause of the failure.

----- amqkacca.c : 1870 -------------------------------------------------------
24/10/11 19:00:48 - Process(15483.871) User(mqm) Program(amqrmppa)
AMQ9002: Channel 'NRPEB023.ACT01' is starting.

EXPLANATION:
Channel 'NRPEB023.ACT01' is starting.
ACTION:
None.


Any clue as to why the conn to localhost might be dropped?

Are you sure NRPEB023.ACT01 is the name of the SVRCONN channel? The AMQ8209 message indicates it dropped a connection from a MQ Client app which is running on the local host, not a remote host. Usually there is an errno number which indicates the nature of the TCP comms error, but I can't see one here.

The messages can't be very important if the expiry is set to 60 seconds... Are they some sort of notification message that doesn't have any critical business value?
_________________
Glenn
Back to top
View user's profile Send private message
jshailes
PostPosted: Tue Oct 25, 2011 3:54 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

There's certainly not a sniffer running on the local machine. Is it possible for someone else to be running one? How can I check for this? I have thought about running wireshark but I'm not sure what I'd be looking for other than a reset packet.

The messages are pretty important to us - they show the movement of trains around the uk rail network and allow me to do some analysis. There are a number of companies recieving this data - I find it hard to believe that they too are losing messages which points to it being configuration our side.

NRPEB023.ACT01 is definitely the name of the server connection channel. There are a number of server connection channels providing us with different types of data. This seems to be the only unstable one and happens to be the one with the most messages - I'm not sure if this is related.

Quote:
The AMQ8209 message indicates it dropped a connection from a MQ Client app which is running on the local host, not a remote host.

Could this be because I'm using MQIPT? The only thing I can think of that might be connecting to the MQ server on localhost is MQ Explorer - I have that running all the time. The java client that does the message persistence is hosted on another machine.
Back to top
View user's profile Send private message
jshailes
PostPosted: Tue Oct 25, 2011 4:01 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

I've just found another combination of errors in the logs indicating that the connection to localhost timed out:

Code:


25/10/11 12:51:58 - Process(15483.896) User(mqm) Program(amqrmppa)
AMQ9259: Connection timed out from host '127.0.0.1'.

EXPLANATION:
A connection from host '127.0.0.1' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.
----- amqccita.c : 3678 -------------------------------------------------------
25/10/11 12:51:58 - Process(15483.896) User(mqm) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'NRPEB023.ACT01' ended abnormally.
ACTION:
Look at previous error messages for channel program 'NRPEB023.ACT01' in the
error files to determine the cause of the failure.
----- amqrmrsa.c : 504 --------------------------------------------------------
25/10/11 12:52:08 - Process(15483.897) User(mqm) Program(amqrmppa)
AMQ9002: Channel 'NRPEB023.ACT01' is starting.

EXPLANATION:
Channel 'NRPEB023.ACT01' is starting.
ACTION:
None.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Oct 25, 2011 4:11 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

It's either a resource issue on your local machine, or it's a local level firewall or a network configuration that is closing these connections behind MQ's back.

It seems a little odd that everything is complaining about 127.0.0.1 - surely these are connections that are coming in over a real network and thus at a real IP?
Back to top
View user's profile Send private message
jshailes
PostPosted: Tue Oct 25, 2011 4:26 am    Post subject: Reply with quote

Apprentice

Joined: 18 May 2009
Posts: 31

Quote:

It's either a resource issue on your local machine, or it's a local level firewall or a network configuration that is closing these connections behind MQ's back.

I will check again that the firewall is disabled but I'm pretty certain it is. Also if this was the problem I would've thought all server connection channels would be affected? I've monitored the CPU and memory, everything seems fine. I suppose a resource issue supports the fact that the problematic channel is the one with the highest volume of messages..

Quote:
It seems a little odd that everything is complaining about 127.0.0.1 - surely these are connections that are coming in over a real network and thus at a real IP?

I can't understand why 127.0.0.1 is the problem. The server has an external IP which was provided to the other end when we set it up. I don't understand how MQIPT works but it does run on the local machine, could this be something to do with it?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » General IBM MQ Support » Debugging SVRCONN
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.