|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
Heartbeat Interval vs. AdoptMCA |
« View previous topic :: View next topic » |
Author |
Message
|
PeterPotkay |
Posted: Tue Aug 13, 2002 5:48 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Jeff, in your scenario, you waited past the disconnect interval, and the RCVR was still stuck. This is correct. But did you wait past the Heartbeat Interval? What was the value on both sides of the channel, since the larger of the two is used? And did both platforms support it?
Either way, I think its smart to use both AdoptMCA and Heartbeat. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Tue Aug 13, 2002 6:18 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
The heartbeat interval was way less than the disconnect interval.
Well I agree with you on using both. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
oz1ccg |
Posted: Tue Aug 13, 2002 6:49 am Post subject: |
|
|
 Yatiri
Joined: 10 Feb 2002 Posts: 628 Location: Denmark
|
Interesting discusion Peter and mrlinux, this is a greate example of interpetation of those disgusting manuals.. Well most of it is in there, but when it comes to the real world, it's hard to read them.
Anyway your conclusion was the same as I faced and concluted some time ago in the real world, but it seems to that the RCVR/RQSTR-channel newer receives anything from the TCP/IP socket when a network failure occurs it just waits forever.
It's the same kind of problem with ClientServerConnections where the connection are broken for any reason, and the Client machine is stille sitting on the network.... it just queues up (and eats up storage).
There have been some requests to IBM about fixing it, but they haven't done it yet, so unlucky client connections is still queuing up.
Just my $0.02  _________________ Regards, Jørgen
Home of BlockIP2, the last free MQ Security exit ver. 3.00
Cert. on WMQ, WBIMB, SWIFT. |
|
Back to top |
|
 |
jc_squire |
Posted: Wed Aug 14, 2002 5:36 pm Post subject: |
|
|
 Centurion
Joined: 14 Apr 2002 Posts: 105 Location: New Zealand
|
Hi Gents,
Seen this before on 5.1 which is now fixed on 5.2
Both ends of the channel start and run, a network failure occurs, the sdr side attempts to re-establish the connection but is unable to because the receiver is still running - i.e. the receiver is not able to determine that the connection is not active. We have actually unplugged the network cable and through dis chs it still shows running !!!!!
On 5.1 the receiver chl must be stopped and then started or use AdoptnewMCA. The next attempt from the sdr will be accepted.
On 5.2 the tcp time out (twice the value of hbint) on the rcvr checks if the the connection is still active, if it is not it closes the connection and the next attempt from the sdr is accepted.
My understanding is
- keep alive only works on the sender side (not on the rcvr)
- there has to be a heartbeat exchange for the rcvr to quisce - this cannot take place if the chls are down hence the need for tcp/ip time out. In the information centre see:
Quote
Checking that the other end of the channel is still available
In MQSeries for AIX, AS/400, HP-UX, OS/2 Warp, OS/390 without CICS, Sun Solaris, and Windows NT, you can use the heartbeat-interval channel attribute to specify that flows are to be passed from the sending MCA when there are no messages on the transmission queue. This is described in Heartbeat interval (HBINT).
In MQSeries for AIX, AS/400, HP-UX, OS/2 Warp, OS/390 without CICS, Sun Solaris, VSE/ESA, and Windows NT, if you are using TCP as your transport protocol, you can use the SO_KEEPALIVE option on the TCP/IP socket. If you specify this option, TCP periodically checks that the other end of the connection is still available, and if it is not, the channel is terminated.
In MQSeries for AIX, AS/400, HP-UX, OS/2 Warp, Sun Solaris, and Windows NT, if you are using TCP as your transport protocol, the receiving end of inactive connections can also be closed if no data is received for a period of time. This period of time is determined according to the HBINT (heartbeat interval) value.
The time-out value is set as follows:
For an initial number of flows, before any negotiation has taken place, the timeout is twice the HBINT value from the channel definition.
When the channels have negotiated a HBINT value, the timeout is set to twice this value.
Unquote
This is my interpretation anyway. Hope it helps.
Regards _________________ J C Squire
IBM Certified Specialist - MQSeries |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Aug 15, 2002 5:31 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
JC,
From the IBM Tech Conferance:
"*The Heartbeat is not dependent on the availability of the sender channel. If no heartbeat packets are recieved within the Heartbeat check interval the receiver will assume an outage and go "Inactive". "
I confirmed this with Paul from Hursley and he says that the lack of a heartbeat is telling the RCVR that the network is down and it's time to go Inactive.
I haven't tested this myself, but Jeff indicated it didn't work for him. Curious if anyone else has seen HeartbeatInterval make a RCVR go Inactive if no Heartbeat was sent.
(Set HeartbeatInterval to 5 minutes and then yank the cable on a running channel. Within 6 minutes, the RCVR should be Inactive.) _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mrlinux |
Posted: Thu Aug 15, 2002 5:46 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
Well I have retested it with 2 qmgrs at V5.2 and it works at least for the one test, Iam not convinced that it is 100 percent. I beleive when I tested it before one or both of the queue managers were 5.1 _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
jc_squire |
Posted: Thu Aug 15, 2002 1:30 pm Post subject: |
|
|
 Centurion
Joined: 14 Apr 2002 Posts: 105 Location: New Zealand
|
Well, then there is a bit of a conflict here between Hursley and Dallas ......
I have emailed you a PDF "MQSeries Disconnect Interval, TCP/IP Keep Alive and Heartbeat" supplied by Paul Sehorne at the IBM Dallas Systems Centre. In this PDF he explains:
- TCP/IP keep alive determines if the partner IP can be reached and that it knows nothing about MQSeries channels (invoked on the IP layer) and Keepalive operates between machines not between queue managers.
- An MCA is a "misbehaved application" in that the sockets RECEIVE function call is a blocking function call and causes an MCA to behave similar to an application that does not use FAIL_IF_QUIESCING.
NOW THIS IS WHERE THE DISCREPANCY COMES IN
- The hearbet feature allows the receiving qmgr to quiesce when no messages are flowing. During the discint the receiving MCA is blocked on a RECEIVE function call, the only wat to signal it is to have its partner sending MCA to send it a signal, the heartbeat. Upon receipt of the heartbeat, the receiving MCA returns from the RECEIVE function call long enough to see if its local queue manager is quiescing. If the local queue manager is quiescing, the receiver channel will stop. Otherwise the receiving MCA returns to it's RECEIVE function call.
This means it allows the receiving queue manager to close the MCA (kill the sockets RECEIVE function call) in it's shutdown process (as you know the shutdown process is called quiescing). From this document it is clear that the heartbeat does not set the receiver chl inactive to accept another new connection.
Logically it still makes sense that the tcp/ip time out is responsible for resetting the receiver chl as per my earlier post.
Just to confuse the issue even further ............ not sure how old this document is
Jeff - quite interesting what they say about the channel initiator i.e. no channel initiator no retrying.
Regards _________________ J C Squire
IBM Certified Specialist - MQSeries |
|
Back to top |
|
 |
jc_squire |
Posted: Thu Aug 15, 2002 1:38 pm Post subject: |
|
|
 Centurion
Joined: 14 Apr 2002 Posts: 105 Location: New Zealand
|
Also, I can remember reading that this was a known bug in 5.1 which is fixed in 5.2 but can't remember where. Can only assume that the tcp/ip timeout feature was developed in 5.2.
We are currently upgrading a customers MQ infrastcuture from mixed 5.0 and 5.1 to 5.2 (before you ask why - they do not have enough faith in 5.3 yet) and I can see these errors (cannot start new connection as chl is allready running) reported in their logs. Jeff - Do you have anything reported in your logs?
Regards _________________ J C Squire
IBM Certified Specialist - MQSeries |
|
Back to top |
|
 |
mrlinux |
Posted: Thu Aug 15, 2002 4:48 pm Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
I didnt look in the logs, but I can on Friday
Test Description:
Had channel up and running between one WIN NT server and one WIN2000 Server unplugged network cable waited 3 minutes did refresh from MMC and channel went inactive. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
mrlinux |
Posted: Fri Aug 16, 2002 3:37 am Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
Well I checked and NO FDCs, nothing in the error logs either. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|