ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » HBINT . How it actually works ?

Post new topic  Reply to topic
 HBINT . How it actually works ? « View previous topic :: View next topic » 
Author Message
tellmey
PostPosted: Wed May 26, 2004 8:19 am    Post subject: HBINT . How it actually works ? Reply with quote

Novice

Joined: 26 May 2004
Posts: 19

Hello All,

I had like to know how Heartbeats actually work.

In a scenario where only HBINT is being used and no KeepAlives.

First thing, when a Heartbeat is sent from a sender channel and the connection is present then the receiver channel sends back a response to this Heartbeat , right ??

So resources are used on the receiver end of the channel to receive the heartbeat and send a response back.

OK then what happens when there is NO network connection,
the channel sends out a Heartbeat, but it is never received at the other end. Then how does the sending end realize there is no connection??
When will it go retrying ?
Does it wait for the reponse and timeout and then go retrying???
Or is it the network that informs it somehow???

I would be really grateful if anybody can throw some light on this.

Thanks.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Wed May 26, 2004 1:42 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Quote:

First thing, when a Heartbeat is sent from a sender channel and the connection is present then the receiver channel sends back a response to this Heartbeat , right ??

Close enough, yes. Who knows EXACTLY what is happening, but conceptually that is correct.

Quote:

OK then what happens when there is NO network connection,
the channel sends out a Heartbeat, but it is never received at the other end. Then how does the sending end realize there is no connection??

A sender channel will immediatly get an error when it tries to send the HB, and there is no network connection, thus knowing to go INACTIVE, not retry (see below). And/or it could be that the SNDR never got the HB ACK from the RCVR, at which point it can assume an outage. The documentation is not clear on which of the 2 above errors is exactly what causes the SNDR channel to realize there is no network to carry its HB. I have read both methods. To you the end user, it doesn't make a differance. The channel will go INACTIVE (not retrying see below) when HBs cannot be succesfully passed back and forth.

The recieving side knows there is an outage because it never got the sender's HB. A RCVR does not know when the next message may come, so it is quite happy sitting forever waiting. HBINT tells it when the next message (the HB) must arrive. If a HB or real message does not come in that time, it gives up waiting and goes INACTIVE.


Quote:

When will it go retrying ?


The receiver will actually time-out if no data is received within twice the Heartbeat interval if the negotiated Heartbeat Interval is less than 60 seconds, or 60 seconds beyond the negotiated heartbeat interval if it is greater than or equal to 60 seconds, by default, before assuming there has been a communications failure. The RCVR will go INACTIVE.

If there is no HB ACK, and there are no messages to send, the SNDR channel goes INACTIVE, ready to be started either manually or by triggering. If a message arrives at that point and there is still no network, then it will go RETRYING.

Lack of HBs will not put a SNDR channel into RETRYING. They put a channel into INACTIVE. Trying to send a real message when there is no connection is what puts a channel into retrying.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
tellmey
PostPosted: Thu May 27, 2004 5:05 am    Post subject: Reply with quote

Novice

Joined: 26 May 2004
Posts: 19

I think we are little confused here

Quote:
If there is no HB ACK, and there are no messages to send, the SNDR channel goes INACTIVE, ready to be started either manually or by triggering. If a message arrives at that point and there is still no network, then it will go RETRYING.


This is NOT true. Each time a HB is sent and the sender knows there is an error (God knows how!!) , it goes into RETRYING. You can test it for yourself. This is regardless of there being a message to be sent or not.

I think you are talking about the case where when the DISCINT expires and channel goes inactive. And then there is a message to be sent, the channel goes into RETRYING. But this is not the case with HB's.

With heartbeats, it is the receiver that goes INACTIVE because of a lack of heartbeat receipt. The SENDER just keeps retrying until the communications is reestablished or all the short and long retry's have been attempted. After which, it has to be manually restarted.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu May 27, 2004 5:31 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

The following is a quote from Session M16 (Keeping Your TCP/IP Channels Up and Running) at The IBM MQ Conferance in Dallas in 2002.

It is from the section on Heartbeats:

Quote:

The Heartbeat Check is useful for recovering from network failures.

If the network is down the heartbeat packets will not be received by the receiver MCA.

Although the sender expects a reply, it will not respond to the absence of a reply. It will go into "Inactive" state, ready to be restarted by the arrival of a mesage on the XMITQ.

The heartbeat is not dependent on the availability of the SNDR channel. If no heartbeat packets are received within the Heartbeat Check interval the RCVR will assume an outage and go "Inactive".




Quote:

and the sender knows there is an error (God knows how!!)


From MD0C: WebSphere MQ - Keeping Channels Up and Running:
Quote:

The Channel is reliant on the lower level communications functions to report any network outages. For example, the resting state of a RECEIVER channel is on a recv call. If the recv call does not return the
Channel is unaware that connection to the partner is lost. One of the difficulties with MCAs was recognising in a timely way when the network had actually failed, as the comms protocols didn't always return errors at the right times. Typically, LU6.2 has been much better at this than TCP/IP (it's a morecomplex protocol).




Quote:

You can test it for yourself.

I don't know how to casue a HB ACK not to come back, short of yanking out the network cable, which is more severe than a HB ACK not coming back.


If you yank the cable, causing a "network outage", then I think this applies:>>>The Channel is reliant on the lower level communications functions to report any network outages.>>>and the channel RETRYs

If somehow the SNDR thinks the network is up, but the HB ACK does not come back, then according to the notes from M16, the channel would go INACTIVE.

Of course the notes could be wrong (they are from 2002), but then again, how do you keep the network and channel up and prevent a HB ACK from coming back?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
tellmey
PostPosted: Thu May 27, 2004 6:19 am    Post subject: Reply with quote

Novice

Joined: 26 May 2004
Posts: 19

I am really not sure if there is actually a HB ACK that is returned.
And that is what I am trying to find out.
From our communication with IBM in particular with Paul Sehorne, who once informed us in his own words

Quote:

" The Sender channel expects no reply to the HeartBeat "

But then in the documentation there are various references, that suggest sender expects reply.

But anyways, assuming there is a HB ACK. I can think of only two scenarios where HB ACK is not received. One when the receiver channel or queue manager is stopping or stopped and other when there is a network failure.(as in yanking out the cable) And I have seen the sender channel going into RETRYING in both the cases.

Quote:
Although the sender expects a reply, it will not respond to the absence of a reply. It will go into "Inactive" state, ready to be restarted by the arrival of a mesage on the XMITQ.

I have not seen any documentation myself that suggest this, and I have not seen it happening. And I dont believe thats true.

From IBM's quote :>>>The Channel is reliant on the lower level communications functions to report any network outages.>>>and the channel RETRYs
Yes this is true when the network fails. But one more thing to be noted is that, the channel is not notified of network failure if it is not using the HeartBeats. So the channel will stay running even if there is a network failure as no Heartbeats are sent.

So there's some mechanism thru which when a HB is sent out, the network notifies the channel of a failure.
Let me know if I am missing something here.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu May 27, 2004 6:38 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Quote:

But anyways, assuming there is a HB ACK. I can think of only two scenarios where HB ACK is not received. One when the receiver channel or queue manager is stopping or stopped and other when there is a network failure.(as in yanking out the cable) And I have seen the sender channel going into RETRYING in both the cases.


But in both these cases, it is not the case that the network is fine AND the channel is fine BUT no HBACK was sent back. Splitting hairs here....
So I think the SENDER goes RETRYING only when it tries to send something (a HB or a real message). At that point, it will try and talk to the network level, and ONLY then will it syncronously get an error from network saying I cannot accept your request, and then start retrying. (I just tested this and this is true: remote QM is down, the channel stayed running until I sent a message or HBINT finally went by).

Contrast this to a RCVR, which will happily sit forever waiting. It is not constantly "pinging" the socket to see if it is really live. But if it knows that a HB should arrive in x seconds, and it doesn't, then it knows there is a problem, and can go INACTIVE.


What I think we are really wondering is the channel is RUNNING, both QMs are up, the SNDR sends a HB, the channel is still up, the network is still up, but for some reason the HB ACK never comes back, if there really is such a thing. What then? The only documentation I have found that describes this scenario EXACTLY is M16, and it says the channel goes INACTIVE. But will it ever happen? Probably not. Odds are if the HB ACK can't make it back, then its because the channel or the network really are down, and then the SNDR will go retrying the next time it tries to send something as described above.




Quote:

But one more thing to be noted is that, the channel is not notified of network failure if it is not using the HeartBeats. So the channel will stay running even if there is a network failure as no Heartbeats are sent.

Unless a real message is attempted. Or KeepAlive expires!

I think we pretty much agree on everything. But the fact is we don't have access to the MCA code, so we will never know what really happens when an HB ACK goes missing, but EVERYTHING else is running and OK.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » HBINT . How it actually works ?
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.