Author |
Message
|
PeterPotkay |
Posted: Sun Aug 11, 2002 7:12 pm Post subject: Heartbeat Interval vs. AdoptMCA |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
AdoptMCA allows a QM to accept a request to start a new reciever channel even if it already has a "running" channel by the same name.
For instance, if reciever channel QM1.QM2 is just sitting there in running status because it never got the command from the sender that the discinterval time has passed because of a network failure, and subsequently QM1.QM2 sender gets more messages to send after the network is back up, QM2 will now accept a second channel by that same name.
My question is , why would the original QM1.QM2 reciever channel ever be in that permanent running state after a network failure if it was using Heartbeats? If the heartbeat interval was say 1 minute, wouldn't QM1.QM2 reciever realize that its not getting any more heartbeats and after a minute (actually a minute + 60 secs) put itself in an inactive state?
Why bother with AdoptMCA? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
oz1ccg |
Posted: Mon Aug 12, 2002 12:00 am Post subject: |
|
|
 Yatiri
Joined: 10 Feb 2002 Posts: 628 Location: Denmark
|
Peter, you're right, it shoudn't be nessesary if you also use KeepAlive.
But we're living in a world with a lot of pirates with only one goal, getting our systems down, this could be a network failure, QMGR crash, etc.
I've been using both TCP/IP keepalive and Heartbeat, but I've missed some channels running on the WAN thru a lot of firewalls (the problem could be here, we traced a lot in TCP/IP, and found some clues, but not the one we we're looking for).
What really helped was the introduction of ADOPTMCA, so the channels could reinitate communications after a problem. After this was implemented we haven't had big problems (we're using triggering on XMITQ with a triggering of 1 minute). _________________ Regards, Jørgen
Home of BlockIP2, the last free MQ Security exit ver. 3.00
Cert. on WMQ, WBIMB, SWIFT. |
|
Back to top |
|
 |
mrlinux |
Posted: Mon Aug 12, 2002 3:49 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
One of the problems is that the receiver code will block waiting for a response from the sender and if it never comes then the code will be hung
waiting. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Aug 12, 2002 6:35 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
But Jeff, wouldn't the reciever code stop waiting and unblock itself after the heartbeat interval passed? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Mon Aug 12, 2002 7:34 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
No the code is hung on a function (select() I think) waiting for a message _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Aug 12, 2002 8:18 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Then whats the point of Heartbeat?
I thought its purpose was to alert channels (both sides) of network failures and to allow them to then go Inactive. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Mon Aug 12, 2002 8:34 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
The Heart Beat Interval gives the RCVR channel the ability to shutdown if
it has exceeded the disconnect interval.
From the IBM Manual
The heartbeat exchange gives the receiving MCA the opportunity to quiesce the channel.
Note:
You should set this value to be significantly less than the value of DISCINT. WebSphere MQ checks only that it is within the permitted range however.
HBINT(integer)
This parameter has a different interpretation depending upon the channel type, as follows:
For channels with a channel type (CHLTYPE) of SDR, SVR, RCVR, RQSTR, CLUSSDR, or CLUSRCVR, this is the time, in seconds, between heartbeat flows passed from the sending MCA when there are no messages on the transmission queue. The heartbeat exchange gives the receiving MCA the opportunity to quiesce the channel. This type of heartbeat is valid only on AIX, Compaq OpenVMS, HP-UX, Linux, OS/2 Warp, OS/400, Solaris, Windows, and z/OS.
Note:
You should set this value to be significantly less than the value of DISCINT. WebSphere MQ checks only that it is within the permitted range however.
For channels with a channel type (CHLTYPE) of SVRCONN or CLNTCONN, this is the time, in seconds, between heartbeat flows passed from the server MCA when that MCA has issued an MQGET with WAIT on behalf of a client application. This allows the server to handle situations where the client connection fails during an MQGET with WAIT. This type of heartbeat is valid only for AIX, Compaq OpenVMS, HP-UX, Linux, OS/2 Warp, OS/400, Solaris, and Windows.
The value must be in the range zero through 999 999. A value of zero means that no heartbeat exchange takes place. The value that is used is the larger of the values specified at the sending side and the receiving side.
KAINT(integer) _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Aug 12, 2002 10:46 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
OK, My question still stands: Wouldn't Hearbeat solve the problem here, and if so, is there any benifit to using AdoptMCA if you already have Heartbeat set at a reasonable value?
Hearbeat will unblock a reciever channel if the network goes down, getting it to a point (inactive) where it will accept a new connection attempt. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Mon Aug 12, 2002 11:59 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
If there is a network error which lasts longer thant the discint of the sender (Or retries are exhausted or someone issues a stop channel) the sender channel will terminate the socket connection and when trying to start the channel sender from then on it is then trying to create a new connection, while the rcvr thinks it still has a valid connection. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Aug 12, 2002 12:05 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Light bulb just went off.
If the sender tries to re establish communication BEFORE the Heartbeat interval passes, then AdoptMCA would kick in and the connection would be made.
If the sender tries to re establish coomunication AFTER the Heartbeat interval passes, then the reciever already is in an Inactive state because of Heartbeat, and AdoptMCA doesn't matter.
The larger your Heartbeat number, the more important AdoptMCA is for timely reconnections. I say timely because even without AdoptMCA, sooner or later the Heartbeat int would pass and since the sender is retrying, it would eventually catch the reciever in the inactive state.
Correct? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Tue Aug 13, 2002 4:57 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
Not sure what you mean by this statement:
it would eventually catch the reciever in the inactive state.
The larger your Heartbeat number, the more important AdoptMCA is for timely reconnections. I say timely because even without AdoptMCA, sooner or later the Heartbeat int would pass and since the sender is retrying, it would eventually catch the reciever in the inactive state _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Aug 13, 2002 5:13 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
A reciever must be in an Inactive state to reestablish communications after a network failure.
You can get the RCVR in this state by manually stopping and starting it (blah), or letting the Heartbeat Interval pass. Once that interval passes, the RCVR will go into the Inactive state on it's own. It's at this point that the SNDR's request for a connection will finally work. i.e. it finally caught the RCVR in an Inactive state.
If the SNDR would never try to reestablish communications until after the Hertbeat Interval put the RCVR in an Inactive state, there would be no need for AdoptMCA. But I would bet that usually more messages are coming to the sender before that has a chance to happen, and in this case AdoptMCA kicks in and lets the connection reeastablish itself before Heartbeat does its thing. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Tue Aug 13, 2002 5:20 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
This is incorrect from my understanding and observations.
Once that interval passes, the RCVR will go into the Inactive state on it's own.
The rcvr channel must receive the heartbeat message in order to come
out of it block waiting for a TCP Message then it can check it's discint timer
and timeout if required or go back to waiting for the next TCP Message. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Aug 13, 2002 5:37 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Hmm, maybe I got it wrong. I based my assumption on the below quote taken from the Dallas Tech Conferance Session M16 "Keeping Your TCP/IP Channels Up and Running", page 16.
*If the network is down the heartbeat packets will not be received by the receiver MCA.
*Although the sender expects a reply, it will not respond to the absence of a reply. It will go into "Inactive" state, ready to be restarted by the arrival of message on the XMITQ.
*The Heartbeat is not dependent on the availability of the sender channel. If no heartbeat packets are recieved within the Heartbeat check interval the receiver will assume an outage and go "Inactive".
That third point is why I said what I did. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Tue Aug 13, 2002 5:43 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
Well I have observed at my current employer here where we have had network outages that lasted longer than the disconnect interval and the receiver channel would still be in the running state and in order to fix it we would have to force the receiver down, unitl we implmented the adoptmca
setting and we have not had those issue's since. Of course this is with
a v5.1 queue manager (rcvr side) and v5.2(sender side). So maybe it
has been changed. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
|