ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » How do cluster channels detect a network error?

Post new topic  Reply to topic Goto page Previous  1, 2, 3, 4  Next
 How do cluster channels detect a network error? « View previous topic :: View next topic » 
Author Message
PeterPotkay
PostPosted: Mon Nov 05, 2007 8:13 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

JYama wrote:

What I'd like to know is why the clussdr channel indicated 'RUNNING' even the target QMgr was NOT running, why the status didn't change to 'RETRYING' after sending ONE message to the target QMgr which had not been running, why it took 20 to 30 secs that the status of the clussdr changed to 'RETRYING', and what is this 'long' interval.
How can I shorten this interval?


I agree with you. If you send 1 message down a channel that is not valid you would think it should go into retrying and thus not accept more messages. Maybe not immediatly, but certainly within 5 or 10 seconds it should know enough.

Read this doc it will help. There is a Japanese version too:
http://www-1.ibm.com/support/docview.wss?rs=203&uid=swg24006699&loc=en_US&cs=utf-8&lang=en

But I don't see that it tells us exactly how fast a channel will go into retrying when it realizes there is a problem.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Mon Nov 05, 2007 2:41 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20757
Location: LI,NY

Peter I think that goes back to the retry interval and retry count of the cluster channel . If the retry is 10 times and the retry interval is 1.000 second, you have potentially 10 seconds until the channel notifies the qmgr that it is in retry mode...

A lot of messages can get to the cluster xmitq in 10 seconds.

Note and do not confuse retry count and retry interval with short retry and long retry. Short and long retry only apply after the retry count has been hit.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Mon Nov 05, 2007 3:02 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

FJ those 2 parms you mention (retry interval and retry count) are only applicable to the RCVR side of a channel and only come into play when the RCVR-type MCA cannot put to the destination q. It will wait retry interval ms before trying to reput the message. It will then attempt this retry count times. And then will put the message to the DLQ or get rid of it or stop the channel depending on the scenario.

Those 2 parms don't factor into JYama's problem here, which is how fast will the SNDR side realize there is a problem.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Nov 05, 2007 3:28 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

JYama,
You never answered the question about the NPMSPEED attribute of the CLUSRCVR channels on QM2. If its set to FAST and the messages are non persistent then you are seeing expected behaviour.

If the messages are persistent -or- the message speed of the channel is set to Normal than the 1st message down the channel that is no longer able to talk to QM2 should throw the channel into retry*. All future messages should get routed to QM1. Any uncommitted messages in the channel's batch, including the one that made the channel retry should get rolled back and would be eligible to go to another QM in the cluster. Unless those messages are specifically addressed to QM2.

* A Heartbeat attempt will also do this.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JYama
PostPosted: Mon Nov 05, 2007 4:15 pm    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

PeterPotkay wrote:
JYama,
You never answered the question about the NPMSPEED attribute of the CLUSRCVR channels on QM2. If its set to FAST and the messages are non persistent then you are seeing expected behaviour.

If the messages are persistent -or- the message speed of the channel is set to Normal than the 1st message down the channel that is no longer able to talk to QM2 should throw the channel into retry*. All future messages should get routed to QM1. Any uncommitted messages in the channel's batch, including the one that made the channel retry should get rolled back and would be eligible to go to another QM in the cluster. Unless those messages are specifically addressed to QM2.

* A Heartbeat attempt will also do this.

What you're saying is exactly what I expected!
But NOT...., this is my problem....

BTW, regarding NPMSPEED, since I want msgs to be 're-routed', I changed the value to NORMAL. Also msgs are non-per.

I think what I've been discussing can be summarized into two questions;
1. Why msgs routed to 'invalid' route(or channel) were NOT re-routed or got rolled back?
2. Why it took such a long period to recognize the invalid route, even the first msg was exchanged?

Any ideas?


PeterPotkay wrote:
Read this doc it will help. There is a Japanese version too:
http://www-1.ibm.com/support/docview.wss?rs=203&uid=swg24006699&loc=en_US&cs=utf-8&lang=en
Thank you for your kindness.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Nov 05, 2007 4:24 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

JYama wrote:

BTW, regarding NPMSPEED, since I want msgs to be 're-routed', I changed the value to NORMAL. Also msgs are non-per.

You made this change before or after you tested?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JYama
PostPosted: Mon Nov 05, 2007 4:29 pm    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

PeterPotkay wrote:
JYama wrote:

BTW, regarding NPMSPEED, since I want msgs to be 're-routed', I changed the value to NORMAL. Also msgs are non-per.

You made this change before or after you tested?

NPMSPEED=NORMAL is my initial setting so this value was fixed BEFORE the test.
'Changed' means that I changed it on purpose from default value 'FAST' to 'NORMAL'.
Sorry for your confusion.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Mon Nov 05, 2007 8:48 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20757
Location: LI,NY

PeterPotkay wrote:
FJ those 2 parms you mention (retry interval and retry count) are only applicable to the RCVR side of a channel and only come into play when the RCVR-type MCA cannot put to the destination q. It will wait retry interval ms before trying to reput the message. It will then attempt this retry count times. And then will put the message to the DLQ or get rid of it or stop the channel depending on the scenario.

Those 2 parms don't factor into JYama's problem here, which is how fast will the SNDR side realize there is a problem.

Thanks for setting me straight. I missed the fact that this was only valid for the cluster receiver, receiver and requester channel types
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Tue Nov 06, 2007 8:42 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

JYama based on all the info so far I think you have a case for opening a ticket with IBM support. You shouldn't lose 4 messages. The channel should go into retrying mode as soon as the network layer reports back to the SNDR MCA that the connection is no longer valid. And if there are any uncommited messages in that channel batch they should get rolled back and be eligible to be put to another QM in the cluster.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Nigelg
PostPosted: Tue Nov 06, 2007 12:54 pm    Post subject: Reply with quote

Grand Master

Joined: 02 Aug 2004
Posts: 1046

If the msgs are small it is possible to lose more than 1 msg, since the buffer will not actually be sent down the wire until it is full.
I disagree that a PMR is needed; msgs will not be 'lost' - more accurately, discarded by the system since the user did not specify that they were important enough to keep - if the channel attributes, particularly NPMSPEED, are properly set. Note that for cluster channels the attributes have to be set on the CLUSRCRV, and that changed attributes do not take effect until after a running channel is restarted.
_________________
MQSeries.net helps those who help themselves..
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue Nov 06, 2007 1:59 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Nigelg wrote:
If the msgs are small it is possible to lose more than 1 msg, since the buffer will not actually be sent down the wire until it is full.

What buffer?

Nigelg wrote:
I disagree that a PMR is needed; msgs will not be 'lost' - more accurately, discarded by the system since the user did not specify that they were important enough to keep - if the channel attributes, particularly NPMSPEED, are properly set.

If the channel speed is Normal, and the messages didn't expire or get committed to the destination QM and there are 4 messages "missing", isn't that a problem?

Is waiting 30 seconds before the channel starts retrying unexpected?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JYama
PostPosted: Tue Nov 06, 2007 4:46 pm    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

Nigelg wrote:
If the msgs are small it is possible to lose more than 1 msg, since the buffer will not actually be sent down the wire until it is full.

What are you talking about? msg length? batch size or something??
Could you elaborate on that, please?
My test msgs were about 2KB each, BTW.

Nigelg wrote:
-msgs will not be 'lost' -

Right, it is exactly the behavior that I expected.
The problem is I found multiple msgs lost in my environment.

Nigelg wrote:
if the channel attributes, particularly NPMSPEED, are properly set. Note that for cluster channels the attributes have to be set on the CLUSRCRV, and that changed attributes do not take effect until after a running channel is restarted.

Regarding NPMSPEED attribute and channel restart, NPMSPEED=NORMAL is an initial attribute of my cluster channels, so I've never changed it since I configured my MQ Cluster environment.
What is the point you want to emphasize?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Nov 06, 2007 7:00 pm    Post subject: Reply with quote

Guest




Quote:
The problem is I found multiple msgs lost in my environment


For NPMSPEED(FAST) and non-persistent messages that can't be delivered to the destination queue or the dlq (possible reasons: queue full, msg too big for queue, queue put-inhibited), you have directed the message channel agent to erase, eradicate, purge, destroy, delete, vaporize the message(s).

MQ does NOT lose messages.
MQ does NOT lose messages.
MQ does NOT lose messages.
Repeat as necessary.
Back to top
JYama
PostPosted: Tue Nov 06, 2007 10:06 pm    Post subject: Reply with quote

Master

Joined: 27 Mar 2002
Posts: 281

One possible situation when a msg would be lost in my case is that the message had already arrived in QMgr2 just before QMgr2's (node) shutdown.(In my test, I executed 'halt -q') In this case, I believe this non-per msg would be lost even if NPMSPEED=NORMAL.
What I can't understand is that it seemed MQ cluster was keeping routing incoming msgs to the route to QMgr2 which should not be chosen as a valid route because the node (containing QMgr2) was not available.
Quote:
APL MQPUT/GET to AQ) -->QMgr0 AQ(alt queue, tgtQ=Q1)
+
+--- QMgr1(Q1)
+
+--- QMgr2(Q1)


Last edited by JYama on Tue Nov 06, 2007 10:29 pm; edited 2 times in total
Back to top
View user's profile Send private message
Nigelg
PostPosted: Tue Nov 06, 2007 10:08 pm    Post subject: Reply with quote

Grand Master

Joined: 02 Aug 2004
Posts: 1046

Quote:
What is the point you want to emphasize?


WMQ does not lose msgs.

This is incompatible with your statements in this post. Easily the most likely resolution of this syllogism is that your statements are mistaken, and that the conditions in which the channel is running are not as you state.
_________________
MQSeries.net helps those who help themselves..
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2, 3, 4  Next Page 2 of 4

MQSeries.net Forum Index » Clustering » How do cluster channels detect a network error?
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.