|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
Problem with TCP keepalive |
View previous topic :: View next topic |
Author |
Message
|
awatson72 |
Posted: Mon Jun 05, 2006 10:34 am Post subject: Problem with TCP keepalive |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
On an AIX system, I have configured a Queue Manager to use keepalive by setting:
TCP:
KeepAlive=YES
in qm.ini
On a SVRCONN channel that has recently had traffic, but now has none, the status shows as "Running", substate "Receving", conname is 10.11.12.13
The SVRCONN channel has HBINT 10 and KAINT 15.
The AIX system has
tcp_keepidle = 3600
When I perform a TCPDump on AIX, shouldn't I see some activity every 15 seconds between the QM host AIX server, and the server hosting the application connecting with the SVRCONN channel, (10.11.12.13) ???
I see nothing, and it's leading me to believe that I haven't configured keepalive correctly, but I have done exactly as the documentation says.
Any insight appreciated. _________________ Andrew Watson
L.L. Bean, Inc. |
|
Back to top |
|
 |
wschutz |
Posted: Mon Jun 05, 2006 10:46 am Post subject: |
|
|
 Jedi Knight
Joined: 02 Jun 2005 Posts: 3316 Location: IBM (retired)
|
Quote: |
You can set the KeepAlive Interval (KAINT) attribute for channels on a per-channel basis. On platforms other than z/OS, you can access and modify the parameter, but it is only stored and forwarded; there is no functional implementation of the parameter. If you need the functionality provided by the KAINT parameter, use the Heartbeat Interval (HBINT) parameter, as described in Heartbeat interval (HBINT). |
_________________ -wayne |
|
Back to top |
|
 |
awatson72 |
Posted: Mon Jun 05, 2006 12:27 pm Post subject: |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
OK, so if HBINT is providing the functionality of the KAINT on non-Z/OS platforms, I should see a TCP packet every 15 seconds with my configuration, but I don't.
As an experiment , I set tcp_keepidle to 20, (10 seconds) at the AIX level, restarted the app, and the channel ( to make sure that the new connections would inherit the new setting), but I still see no activity.
I'm going through all this trouble because a new firewall is tearing down MQ connections every hour, and wreaking havoc on applications, (MDB and otherwise). I'm trying to follow the recommendations for solving this by making the firewall aware of the fact that, yes, MQ is still in fact quite dependent on that connection, please don't tear it down. _________________ Andrew Watson
L.L. Bean, Inc. |
|
Back to top |
|
 |
mvic |
Posted: Mon Jun 05, 2006 12:44 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
|
Back to top |
|
 |
wschutz |
Posted: Mon Jun 05, 2006 2:17 pm Post subject: |
|
|
 Jedi Knight
Joined: 02 Jun 2005 Posts: 3316 Location: IBM (retired)
|
Except:
Quote: |
On server-connection and client-connection channels, heartbeats flow only when a server MCA is waiting on an MQGET command with the WAIT option which it has issued on behalf of a client application. |
So if that mq client isn't in a mqget w/ wait state, hbint's are flowing....
EDIT: I meant to type:
So if that mq client isn't in a mqget w/ wait state, hbint's areN'T flowing.... _________________ -wayne
Last edited by wschutz on Wed Jun 07, 2006 5:51 pm; edited 1 time in total |
|
Back to top |
|
 |
awatson72 |
Posted: Tue Jun 06, 2006 4:52 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
I've reviewed the MD0C presentation, in fact I attended it at last year's T&M conference. It basically says to use keepalives for SVRCONN channels, which is exactly what I'm trying to get working. According to the quote provided by wschultz, keepalive itself is either on or off for the queue manager as a whole, controled by qm.ini, with no other configuration on distributed platforms, (even though the admin interfaces lead you to believe that you can change the interval).
The channel substate is being reported as "Receiving". Should heartbeats be flowing in that case, or is that not the information needed to tell for sure?
Thanks for the guidance. _________________ Andrew Watson
L.L. Bean, Inc. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Jun 06, 2006 5:06 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Are you sure that your tcp_keepidle setting took effect at the AIX level? You might have to restart the network interface?
Are you sure you restarted the queue manager after setting KeepAlive=Yes in qm.ini?
The documentation is quite specific that heartbeats for SVRConns only flow when the client app is issuing an MQGET with WAIT. I don't know what the substate "Receiving" indicates. So if the app is not in an MQGET with WAIT most of the time, then your firewall is always going to want to close the connection. AND it might even be right there. If the app is really sitting and doing nothing with MQ for an hour and not waiting for a message to come from someone else , it should probably be nice and close the connection. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
awatson72 |
Posted: Tue Jun 06, 2006 6:08 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
I'm a little confused -
wchutz says:
Quote: |
So if that mq client isn't in a mqget w/ wait state, hbint's are flowing.... |
and jefflowery says:
Quote: |
The documentation is quite specific that heartbeats for SVRConns only flow when the client app is issuing an MQGET with WAIT. |
Sounds like this is conflicting information.
The application that is being compromised by the firewall timeout interval of one hour is a WebSphere MDB application, the listener port and QCF are configured with default parameters. The WAS resides on the other side of the firewall from the MQ server and queue manager.
Regarding the restarts of the QM and the network interface, these have been done to no avail.
Still stuck. _________________ Andrew Watson
L.L. Bean, Inc. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Jun 06, 2006 4:07 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Wayne made a typo. HBs only flow for MQClients while the MQClient is in a blocking MQGET with wait.
Are there MDBs in this scenario? Are they doing the gets? If so, they will certainly issue a fresh get with wait many times per hour.
But I was wondering, forget the HBs for a sec. If Keep Alive is in fact turned on, how does the server know if the client socket is still there???? There must be some sort of Keep Alive checking going on both ways on the wire. Wouldn't the firewall see THAT as activity? This is really a question for network and firewall experts; its not an MQ thing at all, but could help your MQ scenario if properly understood. Who knows, maybe a firewall is slick enough to know that keep alive traffic is not "real" traffic, and thus ignores it. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Jun 06, 2006 4:42 pm Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Here's my understanding - based on my poor memory from the last time we really discussed in depth how MDB listener ports in WAS work.
The listener port is constantly (every few seconds) browsing the queue for messages on one connection. In addition, there are other connections in a pool that will be used for each instance of the MDB when a message arrives (up to a certain number). So if an MDB is configured to only every have a single instance, there will be two connections being used - and three if there are two and etc.
Each of these connections will be a single instance of the SVRCONN/CLNTCONN channel pair.
Now, suppose the queue is empty most of the time (as it should be). The browse thread is going to issue a GET with WAIT, get an *immediate* 2033, and go to sleep again until the next time it needs to check. Likewise all the other threads in the pool are going to remain idle - not in a GET with WAIT at all either.
So no heartbeats are going to flow, and the channel will look "inactive" to the firewall.
You should discuss options with your firewall administrators. Some of it may depend on the capabilities of the firewall in question. I *assume* that all modern firewalls would let you specific a timeout value for connections at the IP address level, and not require only a global value. But I'm not a firewall expert - so I don't know.
If they are able to make a specific change, but are unwilling to do it at the IP address level, and want tighter control - you can use the LOCALADDR parameter on the SVRCONN to specify what port number the channel will run under. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Jun 06, 2006 4:46 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
jefflowrey wrote: |
Now, suppose the queue is empty most of the time (as it should be). The browse thread is going to issue a GET with WAIT, get an *immediate* 2033, and go to sleep again until the next time it needs to check. Likewise all the other threads in the pool are going to remain idle - not in a GET with WAIT at all either.
So no heartbeats are going to flow, and the channel will look "inactive" to the firewall. |
I disagree. An MQGET over a SVRCONN channel that returns a 2033 will most definitly generate traffic on the wire. The MQGET request up to the MQ server will be seen as one "message" to the Messages Count on the SVRCONN channel, and the result, 2033 or otherwise, will be seen as a second "message" on the channel as it streams back to the client, in this case the MDB. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Jun 06, 2006 4:57 pm Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Yes, traffic on the wire - certainly. The GET statement will flow over to the MCA, and the RC2033 will flow back.
I guess I mean that this may not be big enough or take enough time for the firewall to notice according to it's rules of "inactive".
I also reserve judgement on whether or not the firewall might be getting confused between different instances of the SVRCONN either when deciding if they are inactive or in deciding what to shut down. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
awatson72 |
Posted: Wed Jun 07, 2006 4:39 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
What I’ve noticed is that when the MDB listener starts in WAS, there are two SVRCONN channels started under the definition that the QCF under the MDB is pointed to. When the MDB application is doing no work, which is usually, especially in my test environment, one of the channels is in MQGET, and the Messages count increases as the MDB polls. (Reference http://www.mqseries.net/phpBB2/viewtopic.php?t=29844
). The MDB polls every 5000 ms by default. The other channel is in state Receiving. After an hour of no messages arriving on the MDB queue, we can see the firewall tear down a connection, perhaps the “companion” connection, that one that is NOT doing the MQ Get with Waits every 5000ms, because that one sounds like it should appear active to the firewall. However, MQ and the MDB application are still relying on the “companion” connection so when a message does arrive beyond the one hour time-out the MDB fails. If my assumptions are correct, (and I’ll try to do some more verifiication), the question of why a keepalive/heartbeat isn’t happening for the “companion” channel, is still outstanding. IMHO, this channel should be kept alive by heartbeat or keepalive, but even with a sniffer hooked to the server, I still see no traffic that would suggest these are flowing.
Thoughts? _________________ Andrew Watson
L.L. Bean, Inc. |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jun 07, 2006 4:45 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Yes. The MDB polls on one connection, and reserves a pool of one or more other connections for passing to an instance of the MDB when a message arrives...
I think if you configure the MDB to retry at least once, then this connection will get reestablished after the firewall kills it. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
awatson72 |
Posted: Wed Jun 07, 2006 5:06 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2004 Posts: 69 Location: Freeport, Maine
|
I agree that it would probably work to set a retry > 0 for the MDB, but I'm being told by the developer that doing so could cause significant problems with data in some cases, and besides, it's a little messy to have failures and retries going on in the app all day. My best solution is to make the firewall aware that the connection should not be killed. _________________ Andrew Watson
L.L. Bean, Inc. |
|
Back to top |
|
 |
|
|
  |
Goto page 1, 2 Next |
Page 1 of 2 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|