Author |
Message
|
jshailes |
Posted: Mon Oct 24, 2011 9:04 am Post subject: Debugging SVRCONN |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
I have created a server-connection channel to receive some data from our clients and am having some difficulty sustaining a connection - I've enabled events on the channel and every 8-30 minutes the channel is stopped and started again. The duration between stop and start varies from a few seconds to occasionally a few minutes. If the latter occurs I lose messages because they have a lifetime of 60 seconds.
Does anyone know how I can go about investigating this issue further? |
|
Back to top |
|
 |
bruce2359 |
Posted: Mon Oct 24, 2011 9:09 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
The first issue I see is that the messages have expiry set to 60 seconds. Is this intentional? This means that after 60 seconds, the message is no longer consumable.
Does the creating app end itself after 60 seconds (of inactivity)?
Is the app queue triggered? Is the consuming app coming to life within the 60 seconds? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
jshailes |
Posted: Mon Oct 24, 2011 9:32 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
The company generating the messages have set an expiry on the messages of 60 seconds. The message then comes over a server connection channel, though MQ IPT and onto a queue hosted on our MQ server.
I have a java client app that listens to the queue and persists any message received. As far as I am aware this client app never fails.
I have made enquiries to ask the company generating the messages to increase the expiry but unfortunately they wouldn't change it. |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Oct 24, 2011 9:37 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
every time the channel stops, it should tell you why - either in the mq or client log or in the event itself.
I'd start by your network admins for failing to configure the firewall correctly. |
|
Back to top |
|
 |
jshailes |
Posted: Mon Oct 24, 2011 9:37 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
It might be worth mentioning that the message stream should be continuous - approximately 60 messages per second 24/7 |
|
Back to top |
|
 |
jshailes |
Posted: Mon Oct 24, 2011 9:45 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
Do you know where the MQ logs might be? I've looked high and low to try to find some but to no avail - To identify the failure was the server connection channel I had to turn on the system events queues which simply says 'Channel stopping' or 'Channel starting'.
There's nothing coming out in the JMS client because that's not where the failure occurs - the client connection is robust, it's the server connection channel managed by MQ where the problem is occuring.
After I identified the problem was the server conn channel dropping my first point of call was the firewall. It's not sat behind a hardware firewall and I've now disabled all security on the server itself. |
|
Back to top |
|
 |
jshailes |
Posted: Mon Oct 24, 2011 9:56 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
Sorry that misleadin.. I have looked in /var/mqm/errors and /var/mqm/qmgrs/<qmname>/errors but there is nothing in there relating to the channel dropping. I've also come across various other files, e.g. S000001.log, but they appear to be binary files and therefore I can't read them. Are there any others or a way of turning on additional logging? |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Oct 24, 2011 10:02 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
There should be an errors directory with AMQERR log files on the client install location as well.
The network might still be timing out connections that it has decided are "inactive". Go back to the network team and try again. |
|
Back to top |
|
 |
jshailes |
Posted: Mon Oct 24, 2011 10:15 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
Ok, found them. There are errors relating to the channel stopping:
Code: |
24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9209: Connection to host 'localhost (127.0.0.1)' closed.
EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP. The
connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3373 -------------------------------------------------------
24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'NRPEB023.ACT01' ended abnormally.
ACTION:
Look at previous error messages for channel program 'NRPEB023.ACT01' in the
error files to determine the cause of the failure.
----- amqkacca.c : 1870 -------------------------------------------------------
24/10/11 19:00:48 - Process(15483.871) User(mqm) Program(amqrmppa)
AMQ9002: Channel 'NRPEB023.ACT01' is starting.
EXPLANATION:
Channel 'NRPEB023.ACT01' is starting.
ACTION:
None. |
Any clue as to why the conn to localhost might be dropped? |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Oct 24, 2011 10:29 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
jshailes wrote: |
Ok, found them. There are errors relating to the channel stopping:
Any clue as to why the conn to localhost might be dropped? |
Is there some sniffer program going against the host and port? MQ does not like those...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gbaddeley |
Posted: Mon Oct 24, 2011 3:25 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
jshailes wrote: |
Ok, found them. There are errors relating to the channel stopping:
Code: |
24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9209: Connection to host 'localhost (127.0.0.1)' closed.
EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP. The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3373 -------------------------------------------------------
24/10/11 19:00:38 - Process(15483.870) User(mqm) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'NRPEB023.ACT01' ended abnormally.
ACTION:
Look at previous error messages for channel program 'NRPEB023.ACT01' in the error files to determine the cause of the failure.
----- amqkacca.c : 1870 -------------------------------------------------------
24/10/11 19:00:48 - Process(15483.871) User(mqm) Program(amqrmppa)
AMQ9002: Channel 'NRPEB023.ACT01' is starting.
EXPLANATION:
Channel 'NRPEB023.ACT01' is starting.
ACTION:
None. |
Any clue as to why the conn to localhost might be dropped? |
Are you sure NRPEB023.ACT01 is the name of the SVRCONN channel? The AMQ8209 message indicates it dropped a connection from a MQ Client app which is running on the local host, not a remote host. Usually there is an errno number which indicates the nature of the TCP comms error, but I can't see one here.
The messages can't be very important if the expiry is set to 60 seconds... Are they some sort of notification message that doesn't have any critical business value? _________________ Glenn |
|
Back to top |
|
 |
jshailes |
Posted: Tue Oct 25, 2011 3:54 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
There's certainly not a sniffer running on the local machine. Is it possible for someone else to be running one? How can I check for this? I have thought about running wireshark but I'm not sure what I'd be looking for other than a reset packet.
The messages are pretty important to us - they show the movement of trains around the uk rail network and allow me to do some analysis. There are a number of companies recieving this data - I find it hard to believe that they too are losing messages which points to it being configuration our side.
NRPEB023.ACT01 is definitely the name of the server connection channel. There are a number of server connection channels providing us with different types of data. This seems to be the only unstable one and happens to be the one with the most messages - I'm not sure if this is related.
Quote: |
The AMQ8209 message indicates it dropped a connection from a MQ Client app which is running on the local host, not a remote host. |
Could this be because I'm using MQIPT? The only thing I can think of that might be connecting to the MQ server on localhost is MQ Explorer - I have that running all the time. The java client that does the message persistence is hosted on another machine. |
|
Back to top |
|
 |
jshailes |
Posted: Tue Oct 25, 2011 4:01 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
I've just found another combination of errors in the logs indicating that the connection to localhost timed out:
Code: |
25/10/11 12:51:58 - Process(15483.896) User(mqm) Program(amqrmppa)
AMQ9259: Connection timed out from host '127.0.0.1'.
EXPLANATION:
A connection from host '127.0.0.1' over TCP/IP timed out.
ACTION:
Check to see why data was not received in the expected time. Correct the
problem. Reconnect the channel, or wait for a retrying channel to reconnect
itself.
----- amqccita.c : 3678 -------------------------------------------------------
25/10/11 12:51:58 - Process(15483.896) User(mqm) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'NRPEB023.ACT01' ended abnormally.
ACTION:
Look at previous error messages for channel program 'NRPEB023.ACT01' in the
error files to determine the cause of the failure.
----- amqrmrsa.c : 504 --------------------------------------------------------
25/10/11 12:52:08 - Process(15483.897) User(mqm) Program(amqrmppa)
AMQ9002: Channel 'NRPEB023.ACT01' is starting.
EXPLANATION:
Channel 'NRPEB023.ACT01' is starting.
ACTION:
None.
|
|
|
Back to top |
|
 |
mqjeff |
Posted: Tue Oct 25, 2011 4:11 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
It's either a resource issue on your local machine, or it's a local level firewall or a network configuration that is closing these connections behind MQ's back.
It seems a little odd that everything is complaining about 127.0.0.1 - surely these are connections that are coming in over a real network and thus at a real IP? |
|
Back to top |
|
 |
jshailes |
Posted: Tue Oct 25, 2011 4:26 am Post subject: |
|
|
Apprentice
Joined: 18 May 2009 Posts: 31
|
Quote: |
It's either a resource issue on your local machine, or it's a local level firewall or a network configuration that is closing these connections behind MQ's back. |
I will check again that the firewall is disabled but I'm pretty certain it is. Also if this was the problem I would've thought all server connection channels would be affected? I've monitored the CPU and memory, everything seems fine. I suppose a resource issue supports the fact that the problematic channel is the one with the highest volume of messages..
Quote: |
It seems a little odd that everything is complaining about 127.0.0.1 - surely these are connections that are coming in over a real network and thus at a real IP? |
I can't understand why 127.0.0.1 is the problem. The server has an external IP which was provided to the other end when we set it up. I don't understand how MQIPT works but it does run on the local machine, could this be something to do with it? |
|
Back to top |
|
 |
|