ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » How do cluster channels detect a network error?

Post new topic  Reply to topic Goto page 1, 2, 3, 4  Next
 How do cluster channels detect a network error? « View previous topic :: View next topic » 
Author Message
JYama
PostPosted: Sun Nov 04, 2007 6:31 pm    Post subject: How do cluster channels detect a network error? Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

I'm using WMQv6.021 on AIX and have MQ cluster environment.

My question is how MQ cluster detect a network error and switch the route when one of the queue managers participating in the cluster unexpectedly stops.
For example, what is the interval needed that the cluster channel can detect network error? Heartbeat or something?

In my environment with heartbeat=10[secs], it took about 30 seconds that the failed/stopped QMgr was removed from the cluster.
I want to shorten this time if possible.
Any ideas?

My environment is here;

3 QMgrs are clustered and QMgr1/2 is FullRepository and QMgr0 is partial.
MQ applications "MQPut/Get" to/from AQ which is an aliasQ targeting remote Q1.
Therefore messages are round-robined between Q1 on QMgr1 and QMgr2.

APLMQPUT/GET to AQ) -->QMgr0 AQ(alt queue, tgtQ=Q1)
+
+--- QMgr1(Q1)
+
+--- QMgr2(Q1)

One of the interesting things is that QMgr0 seemed to route and send messages to Q1 on QMgr2 even the QMgr2 was not working(node down).
As a result, I have 4 messages lost during 30 secs of processing time.
I thought the MQ cluster would immediately detects the QMgr2's down and removes it from the destinations. Is it wrong???
(I'm using non-per msgs, BTW.)

I have no idea how I can cope with this.
Back to top
View user's profile Send private message
Michael Dag
PostPosted: Sun Nov 04, 2007 11:23 pm    Post subject: Reply with quote

Jedi Knight

Joined: 13 Jun 2002
Posts: 2607
Location: The Netherlands (Amsterdam)

that's why clustering is not a high availability solution!

when the channel goes into retrying mode, messages are still delivered to the SCTQ destined for QMgr2, so messages are 'stuck' on the SCTQ.

When you stop the channel to Qmgr2 or suspend Qmgr2 and the route becomes unavailable there is a 'last minute' mechanism that checks if stuck messages can be delivered elsewhere...
_________________
Michael



MQSystems Facebook page
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
JYama
PostPosted: Sun Nov 04, 2007 11:42 pm    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

Thank you very much for useful information, Michael.

Stuck messages are OK.
My problem is that multiple messages were lost and there's no stuck messages.

Are there 'external' parameters that affect the behavior of MQ Clustering?
Back to top
View user's profile Send private message
Michael Dag
PostPosted: Sun Nov 04, 2007 11:50 pm    Post subject: Reply with quote

Jedi Knight

Joined: 13 Jun 2002
Posts: 2607
Location: The Netherlands (Amsterdam)

JYama wrote:
Thank you very much for useful information, Michael.

Stuck messages are OK.
My problem is that multiple messages were lost and there's no stuck messages.

Are there 'external' parameters that affect the behavior of MQ Clustering?


there are no lost messages... most likely they are 'in' the retrying channel...
_________________
Michael



MQSystems Facebook page
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
JYama
PostPosted: Sun Nov 04, 2007 11:56 pm    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

Michael Dag wrote:
there are no lost messages... most likely they are 'in' the retrying channel...

Thanks for your update.
Before I'll contact IBM support, I'd like to carify the 'route' of clustered messages.
Is this correct that inbound messages targeted QMgr2 are always pass QMgr1 like QMgr0 -> QMgr1 ->QMgr2?
Additionally if Qmgr2 is not working, the messages would be 'stuck' on STCQ on 'QMgr1'?
Back to top
View user's profile Send private message
JYama
PostPosted: Mon Nov 05, 2007 2:22 am    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

BTW, when does the status of a 'clussdr' channel change from RUNNING to RETRYING if one of the target QMgrs suddenly stopped?
Who decides whether a target QMgr is running or not?

In my case, multiple messages were lost during the 'routing' process.
One interesting thing was that 'clussdr' channel status was 'RUNNING' even its target QMgr was NOT running. When does the status change to RETRYING??

I have a lot of questions about MQ Clustering now...
Back to top
View user's profile Send private message
Vitor
PostPosted: Mon Nov 05, 2007 2:45 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

JYama wrote:

Is this correct that inbound messages targeted QMgr2 are always pass QMgr1 like QMgr0 -> QMgr1 ->QMgr2?
Additionally if Qmgr2 is not working, the messages would be 'stuck' on STCQ on 'QMgr1'?


No, the cluster will auto-define channels between the source and target queue manaers.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Michael Dag
PostPosted: Mon Nov 05, 2007 2:47 am    Post subject: Reply with quote

Jedi Knight

Joined: 13 Jun 2002
Posts: 2607
Location: The Netherlands (Amsterdam)

JYama wrote:
BTW, when does the status of a 'clussdr' channel change from RUNNING to RETRYING if one of the target QMgrs suddenly stopped?
Who decides whether a target QMgr is running or not?

JYama wrote:
In my case, multiple messages were lost during the 'routing' process.


JYama wrote:

One interesting thing was that 'clussdr' channel status was 'RUNNING' even its target QMgr was NOT running. When does the status change to RETRYING??

I have a lot of questions about MQ Clustering now...

_________________
Michael



MQSystems Facebook page
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
Vitor
PostPosted: Mon Nov 05, 2007 2:51 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

JYama wrote:
BTW, when does the status of a 'clussdr' channel change from RUNNING to RETRYING if one of the target QMgrs suddenly stopped?
Who decides whether a target QMgr is running or not?


At the same time as for a non-clustered queue manager - when the sender MCA fails to receive a response!

JYama wrote:

In my case, multiple messages were lost during the 'routing' process.


It's the fate of non-persistent messages to be lost when things go a bit funny.

JYama wrote:

One interesting thing was that 'clussdr' channel status was 'RUNNING' even its target QMgr was NOT running. When does the status change to RETRYING??


When the various intervals expire.

JYama wrote:

I have a lot of questions about MQ Clustering now...


Then the Clustering manual will be your friend!
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Michael Dag
PostPosted: Mon Nov 05, 2007 2:51 am    Post subject: Reply with quote

Jedi Knight

Joined: 13 Jun 2002
Posts: 2607
Location: The Netherlands (Amsterdam)

Pressed submit too soon...

This is not a clustering course, take one or get a manual it is explained in great detail...


JYama wrote:
BTW, when does the status of a 'clussdr' channel change from RUNNING to RETRYING if one of the target QMgrs suddenly stopped?
Who decides whether a target QMgr is running or not?

you were probably looking at the clussdr channel to the repository?
JYama wrote:
In my case, multiple messages were lost during the 'routing' process.

stop saying messages were lost... MQ does not loose messages unless you did something to make it lose messages, like resetting a channel...


JYama wrote:

One interesting thing was that 'clussdr' channel status was 'RUNNING' even its target QMgr was NOT running. When does the status change to RETRYING??

I have a lot of questions about MQ Clustering now...


cluster channels are no different then other channels, like I said, you were probably looking at the defined cluster channels and not the auto defined ones, like Vitor mentioned.
_________________
Michael



MQSystems Facebook page
Back to top
View user's profile Send private message Visit poster's website MSN Messenger
JYama
PostPosted: Mon Nov 05, 2007 3:06 am    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

Quote:
you were probably looking at the clussdr channel to the repository?

Yes, that's right.
Doesn't it indicate the status of a target?

Quote:
stop saying messages were lost... MQ does not loose messages unless you did something to make it lose messages, like resetting a channel...

What I did was that I tried to shutdown one of the target QMgrs.
Thus I guess at least one message should be lost, this is OK, but in my case, 4 msgs were gone... , this is my problem..
Back to top
View user's profile Send private message
Vitor
PostPosted: Mon Nov 05, 2007 3:17 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

JYama wrote:
Thus I guess at least one message should be lost, this is OK, but in my case, 4 msgs were gone... , this is my problem..


How did you come to this number of 4? As you say at least one message should be lost so why do you think 4 is a problem?

And if lost messages are a problem, use persistent messages!
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Mon Nov 05, 2007 4:36 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Vitor wrote:
And if lost messages are a problem, use persistent messages!

_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Mon Nov 05, 2007 7:24 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

The 4 messages that went to QM2 - were they told to go to QM2? Were they specifically addressed to QM2, or is it possible that the putting app used the BIND_ON_OPEN option?

What is the NPMSPEED of the CLUSSRCVR channel on QM2? If its set to FAST and you are sending non persistent messages its quite possible several messages may be sent down a channel that is no longer 100% good and those messages are discarded.

Its gonna take some time for a SNDR to realize the RCVR is having issues. But if your channel speed is Normal, and/or you are using persistent messages, I don't think you should lose any messages assuming the putting app doesn't say that the messages should go to the down QM.

Try your tests again with persistent messages and let us know the results.

The Heartbeat will help identify a downed channel but only comes into play if there are no messages flowing. Also, realize that HB values set to below 60 seconds act a little differently than you would think (or want!): Lookie here:
http://www.mqseries.net/phpBB2/viewtopic.php?t=15619&highlight=heartbeats+seconds
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JYama
PostPosted: Mon Nov 05, 2007 7:42 am    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

Vitor wrote:
How did you come to this number of 4? As you say at least one message should be lost so why do you think 4 is a problem?
And if lost messages are a problem, use persistent messages!

You're right. Using psersistent msgs is the only solution for this.

'4 msgs lost' mean that I couldn't find them in SCTQ, DLQ, etc...
Actually I have an application which was keeping sending a msg per 5secs to the front QMgr0 containg the aliasQ.
What happened was that it took approximately 20 to 30 secs that the 'failed route' was removed from routing.
Msgs of #1,#2,#3,#4, (20 secs in total), were gone, and #5 was successfully routed because, I guess, MQ Cluster could detect that the targetQMgr had not been running...

What I'd like to know is why the clussdr channel indicated 'RUNNING' even the target QMgr was NOT running, why the status didn't change to 'RETRYING' after sending ONE message to the target QMgr which had not been running, why it took 20 to 30 secs that the status of the clussdr changed to 'RETRYING', and what is this 'long' interval.
How can I shorten this interval?

Again, I agree with you that I should use persitent msgs to avoid msg lost, but I'd like to make it clear what's going on in my environment.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2, 3, 4  Next Page 1 of 4

MQSeries.net Forum Index » Clustering » How do cluster channels detect a network error?
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.