MQSeries.net :: View topic - Heartbeat Interval vs. AdoptMCA

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » Heartbeat Interval vs. AdoptMCA

Goto page Previous 1, 2

Heartbeat Interval vs. AdoptMCA

« View previous topic :: View next topic »

Author

Message

PeterPotkay

Posted: Tue Aug 13, 2002 5:48 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Jeff, in your scenario, you waited past the disconnect interval, and the RCVR was still stuck. This is correct. But did you wait past the Heartbeat Interval? What was the value on both sides of the channel, since the larger of the two is used? And did both platforms support it?

Either way, I think its smart to use both AdoptMCA and Heartbeat.
_________________
Peter Potkay
Keep Calm and MQ On

mrlinux

Posted: Tue Aug 13, 2002 6:18 am Post subject:

Grand Master

Joined: 14 Feb 2002
Posts: 1261
Location: Detroit,MI USA

The heartbeat interval was way less than the disconnect interval.

Well I agree with you on using both.
_________________
Jeff

IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries

oz1ccg

Posted: Tue Aug 13, 2002 6:49 am Post subject:

Yatiri

Joined: 10 Feb 2002
Posts: 628
Location: Denmark

Interesting discusion Peter and mrlinux, this is a greate example of interpetation of those disgusting manuals.. Well most of it is in there, but when it comes to the real world, it's hard to read them.

Anyway your conclusion was the same as I faced and concluted some time ago in the real world, but it seems to that the RCVR/RQSTR-channel newer receives anything from the TCP/IP socket when a network failure occurs it just waits forever.
It's the same kind of problem with ClientServerConnections where the connection are broken for any reason, and the Client machine is stille sitting on the network.... it just queues up (and eats up storage).

There have been some requests to IBM about fixing it, but they haven't done it yet, so unlucky client connections is still queuing up.

Just my $0.02

_________________
Regards, JÃ¸rgen
Home of BlockIP2, the last free MQ Security exit ver. 3.00
Cert. on WMQ, WBIMB, SWIFT.

jc_squire

Posted: Wed Aug 14, 2002 5:36 pm Post subject:

Centurion

Joined: 14 Apr 2002
Posts: 105
Location: New Zealand

Hi Gents,

Seen this before on 5.1 which is now fixed on 5.2

Both ends of the channel start and run, a network failure occurs, the sdr side attempts to re-establish the connection but is unable to because the receiver is still running - i.e. the receiver is not able to determine that the connection is not active. We have actually unplugged the network cable and through dis chs it still shows running !!!!!

On 5.1 the receiver chl must be stopped and then started or use AdoptnewMCA. The next attempt from the sdr will be accepted.

On 5.2 the tcp time out (twice the value of hbint) on the rcvr checks if the the connection is still active, if it is not it closes the connection and the next attempt from the sdr is accepted.

My understanding is

- keep alive only works on the sender side (not on the rcvr)
- there has to be a heartbeat exchange for the rcvr to quisce - this cannot take place if the chls are down hence the need for tcp/ip time out. In the information centre see:

Quote

Checking that the other end of the channel is still available
In MQSeries for AIX, AS/400, HP-UX, OS/2 Warp, OS/390 without CICS, Sun Solaris, and Windows NT, you can use the heartbeat-interval channel attribute to specify that flows are to be passed from the sending MCA when there are no messages on the transmission queue. This is described in Heartbeat interval (HBINT).

In MQSeries for AIX, AS/400, HP-UX, OS/2 Warp, OS/390 without CICS, Sun Solaris, VSE/ESA, and Windows NT, if you are using TCP as your transport protocol, you can use the SO_KEEPALIVE option on the TCP/IP socket. If you specify this option, TCP periodically checks that the other end of the connection is still available, and if it is not, the channel is terminated.

In MQSeries for AIX, AS/400, HP-UX, OS/2 Warp, Sun Solaris, and Windows NT, if you are using TCP as your transport protocol, the receiving end of inactive connections can also be closed if no data is received for a period of time. This period of time is determined according to the HBINT (heartbeat interval) value.

The time-out value is set as follows:

For an initial number of flows, before any negotiation has taken place, the timeout is twice the HBINT value from the channel definition.
When the channels have negotiated a HBINT value, the timeout is set to twice this value.

Unquote

This is my interpretation anyway. Hope it helps.

Regards
_________________
J C Squire
IBM Certified Specialist - MQSeries

PeterPotkay

Posted: Thu Aug 15, 2002 5:31 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

JC,
From the IBM Tech Conferance:

"*The Heartbeat is not dependent on the availability of the sender channel. If no heartbeat packets are recieved within the Heartbeat check interval the receiver will assume an outage and go "Inactive". "

I confirmed this with Paul from Hursley and he says that the lack of a heartbeat is telling the RCVR that the network is down and it's time to go Inactive.

I haven't tested this myself, but Jeff indicated it didn't work for him. Curious if anyone else has seen HeartbeatInterval make a RCVR go Inactive if no Heartbeat was sent.

(Set HeartbeatInterval to 5 minutes and then yank the cable on a running channel. Within 6 minutes, the RCVR should be Inactive.)
_________________
Peter Potkay
Keep Calm and MQ On

mrlinux

Posted: Thu Aug 15, 2002 5:46 am Post subject:

Grand Master

Joined: 14 Feb 2002
Posts: 1261
Location: Detroit,MI USA

Well I have retested it with 2 qmgrs at V5.2 and it works at least for the one test, Iam not convinced that it is 100 percent. I beleive when I tested it before one or both of the queue managers were 5.1
_________________
Jeff

IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries

jc_squire

Posted: Thu Aug 15, 2002 1:30 pm Post subject:

Centurion

Joined: 14 Apr 2002
Posts: 105
Location: New Zealand

Well, then there is a bit of a conflict here between Hursley and Dallas ......

I have emailed you a PDF "MQSeries Disconnect Interval, TCP/IP Keep Alive and Heartbeat" supplied by Paul Sehorne at the IBM Dallas Systems Centre. In this PDF he explains:

- TCP/IP keep alive determines if the partner IP can be reached and that it knows nothing about MQSeries channels (invoked on the IP layer) and Keepalive operates between machines not between queue managers.
- An MCA is a "misbehaved application" in that the sockets RECEIVE function call is a blocking function call and causes an MCA to behave similar to an application that does not use FAIL_IF_QUIESCING.

NOW THIS IS WHERE THE DISCREPANCY COMES IN

- The hearbet feature allows the receiving qmgr to quiesce when no messages are flowing. During the discint the receiving MCA is blocked on a RECEIVE function call, the only wat to signal it is to have its partner sending MCA to send it a signal, the heartbeat. Upon receipt of the heartbeat, the receiving MCA returns from the RECEIVE function call long enough to see if its local queue manager is quiescing. If the local queue manager is quiescing, the receiver channel will stop. Otherwise the receiving MCA returns to it's RECEIVE function call.

This means it allows the receiving queue manager to close the MCA (kill the sockets RECEIVE function call) in it's shutdown process (as you know the shutdown process is called quiescing). From this document it is clear that the heartbeat does not set the receiver chl inactive to accept another new connection.

Logically it still makes sense that the tcp/ip time out is responsible for resetting the receiver chl as per my earlier post.

Just to confuse the issue even further ............ not sure how old this document is

Jeff - quite interesting what they say about the channel initiator i.e. no channel initiator no retrying.

Regards
_________________
J C Squire
IBM Certified Specialist - MQSeries

jc_squire

Posted: Thu Aug 15, 2002 1:38 pm Post subject:

Centurion

Joined: 14 Apr 2002
Posts: 105
Location: New Zealand

Also, I can remember reading that this was a known bug in 5.1 which is fixed in 5.2 but can't remember where. Can only assume that the tcp/ip timeout feature was developed in 5.2.

We are currently upgrading a customers MQ infrastcuture from mixed 5.0 and 5.1 to 5.2 (before you ask why - they do not have enough faith in 5.3 yet) and I can see these errors (cannot start new connection as chl is allready running) reported in their logs. Jeff - Do you have anything reported in your logs?

Regards
_________________
J C Squire
IBM Certified Specialist - MQSeries

mrlinux

Posted: Thu Aug 15, 2002 4:48 pm Post subject:

Grand Master

Joined: 14 Feb 2002
Posts: 1261
Location: Detroit,MI USA

I didnt look in the logs, but I can on Friday

Test Description:

Had channel up and running between one WIN NT server and one WIN2000 Server unplugged network cable waited 3 minutes did refresh from MMC and channel went inactive.
_________________
Jeff

IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries

mrlinux

Posted: Fri Aug 16, 2002 3:37 am Post subject:

Grand Master

Joined: 14 Feb 2002
Posts: 1261
Location: Detroit,MI USA

Well I checked and NO FDCs, nothing in the error logs either.
_________________
Jeff

IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries

Display posts from previous:

Goto page Previous 1, 2

Page 2 of 2

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » Heartbeat Interval vs. AdoptMCA

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP