MQSeries.net :: View topic - Auto CLUSSDR channels not starting up!

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Auto CLUSSDR channels not starting up!

Auto CLUSSDR channels not starting up!

« View previous topic :: View next topic »

Author

Message

mqdev

Posted: Fri Feb 18, 2005 9:05 am Post subject: Auto CLUSSDR channels not starting up!

Centurion

Joined: 21 Jan 2003
Posts: 136

We are having this problem intermittently in our cluster:

Our env is : AIX 5.2, MQ5.3CSD05
PRod cluster contains about 1200+ Qmgrs

One of the Apps has 2 Partial Repo QMs (lets say A and B) talk to each other (in a Req-Rsp mode). A sends a req to B who will process it and send back a rsp to A. What we are seeing is that sometimes (only), the req makes it to B and B replies but the message just sits on the cluster xmit queue on B (verified using amqrfdm - it is waiting on the CLUSSDR channel from B to A - once this chl is manually started, the msg makes it to A and everything works).

I have found that TRIGINT on the cluster xmit Q on B is the default (a series of 9's - about 11 days when actually calculated) - so I changed it to 1000 ms so that the trigger msg to startup the CLUSSDR chl is generated every 1 sec after the FIRST trigger msg. None of this has worked - the CLUSDR chl to A had to be started manually - and only then the message flows to A!

My questions are:

1. Do I need to restart the QM if I change TRIGINT parameter for the new value of TRIGINT to take effect?

2. How exactly does TRIGINT parm work? Since the TRIGTYPE on cluster xmit queue is FIRST, the clussdr chl is started immediately after the first message lands up on clus xmit queue. Lets assume that there are a ton of messages on the clus xmit queue - and after the FIRST message, after the clussdr chl starts up and TRIGINT has elapsed, would another clussdr chl start up? Or is the triggering smart enough to know that a clussdr chl has already been started and hence wouldnt put a trigger msg on the initQ at the end of TRIGINT interval (if there are still msgs on the clus xmit q)?

3. I have noticed that the message disappears from the Cluster xmit q on B after a few seconds. BUT the message isnt making it to A ( I have verified this by GET disabling the response queue on A that B's rsp message is not making it to A). The DLQs on both A and B are empty. Looks to me like the QM B is silently discarding the message on cluster xmit Q after sometime....this looks very odd to me. Can someone throw some light on whats going on here?

In any case, if I manually start the clussdr chl to A from B, it works from then on!

Nigelg

Posted: Mon Feb 21, 2005 12:43 pm Post subject:

Grand Master

Joined: 02 Aug 2004
Posts: 1046

1 & 2 Forget about the TRIGINT (and TRIGTYPE TRIGDATA TRIGDEPTH TRIGGER) attributes when looking at cluster channels starting automatically. The qmgr uses a completely different mechanism to trigger start cluster channels, and it does not need any user configuration at all.

There have been several bugs in the area of auto-starting cluster channels.
There is a later one than CSD05, which is where the cluster qmgr status is incorrect in the cluster cache, so the trigger start mechanism wrongly thinks that the channel is already running (or starting, or something).

3. This problem, of losing the first msg, is probably the qmgr discarding a non-persistent msg where NPMSPEED is set to FAST, while B thinks the channel is RUNNING but A has already ended it. This is correct operation. WMQ is much more careless with non-persistent msgs than most people think!
Try changing the NPMSPEED on the CLUSRCVR to NORMAL, and see if the same problem happens again.

mqdev

Posted: Tue Feb 22, 2005 9:41 am Post subject:

Centurion

Joined: 21 Jan 2003
Posts: 136

Thanks Nigel for your reply!
So do you suggest we upgrade to a higher CSD to fix this prob? Is there a patch and/or work around that we could apply to get around this? Upgrading to a higher CSD level is a major project considering the size of our cluster - I would rather go with a work around (which I can apply locally as I see the problem) if one is available.

thanks
-mqdev

Nigelg

Posted: Wed Feb 23, 2005 2:38 am Post subject:

Grand Master

Joined: 02 Aug 2004
Posts: 1046

It really depends on the reason that the channel is not starting, or rather the variation. There is only one reason why a cluster channel does not start, it is because the qmgr thinks that the channel is either already running or shortly will be, i.e. the channel state is RUNNING RETRYING STARTING BINDING or INITIALIZING, or that trigger starting does not apply, channel state STOPPED.
There have been several APARs which have fixed various bits of this.
IC32753, 5.3 GA, added a refresh of the cluster cache so that the current state of the channel is checked.
IC34948, 5.3 CSD05, amended IC32753 to check the right flag in the cluster cache.
IY50188, 5.3 CSD06, did the same as IC43948 for the case when msgs were put in syncpoint.
IY66826, 5.3 CSD10, removes a dirty update (outside lock) of the cluster channel state when the channel is started, and the state is updated under lock in the channel process itself. This may result in several start channel msgs being passed to the channel initiator, but this is harmless because such msgs are ignored if the channel is already running. Note that it is not yet proved that this fix works.

If you are putting msgs in syncpoint, I recommend you upgrade to CSD06, or get an interim fix for IY50188.

Otherwise, I recommend you write a script which checks the depth of the cluster xmitq. If the depth continually increases, browse the queue for unique CorrelIds of the msgs, and start the channels named in the CorrelIds if they are not RUNNING.

Michael Dag

Posted: Wed Feb 23, 2005 4:08 am Post subject:

Jedi Knight

Joined: 13 Jun 2002
Posts: 2602
Location: The Netherlands (Amsterdam)

Nigelg wrote:

... the depth of the cluster xmitq. If the depth continually increases, browse the queue for unique CorrelIds of the msgs, and start the channels named in the CorrelIds if they are not RUNNING.

is this cluster black magic?
_________________
Michael

MQSystems Facebook page

Nigelg

Posted: Wed Feb 23, 2005 5:26 am Post subject:

Grand Master

Joined: 02 Aug 2004
Posts: 1046

The CorrelId of the msg in the cluster xmitq is the name of the cluster channel that the msg is to be sent down. This is not secret, but it is not actually published anywhere in the manuals. The MCA running the cluster channel reads from the cluster xmitq by CorrelId.

jefflowrey

Posted: Wed Feb 23, 2005 5:58 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Nigelg wrote:

This is not secret, but it is not actually published anywhere in the manuals.

So... it's posted on a public notice.. in the basement... behind a door marked "Danger: Wild Animals"?

If it's not in a publicly available document... then it's secret.
_________________
I am *not* the model of the modern major general.

Nigelg

Posted: Wed Feb 23, 2005 6:52 am Post subject:

Grand Master

Joined: 02 Aug 2004
Posts: 1046

It's in a publicly available document now....

I meant it is not IBM Confidential, IBM's classification of information which must not be disclosed.

jefflowrey

Posted: Wed Feb 23, 2005 7:17 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Nigelg wrote:

It's in a publicly available document now....

_________________
I am *not* the model of the modern major general.

mqdev

Posted: Wed Feb 23, 2005 7:47 am Post subject:

Centurion

Joined: 21 Jan 2003
Posts: 136

That bit about CorellIds was a valuble piece of info!

Thanks for your time!

mqdev

Posted: Wed Feb 23, 2005 7:53 am Post subject:

Centurion

Joined: 21 Jan 2003
Posts: 136

Nigelg wrote:

I do want to point out that I am not seeing this happening - there is no chl from B to A at all [ie dis chs(*) does not list any chl from B to A]
Could my prob be different then?

-mqdev

PeterPotkay

Posted: Wed Feb 23, 2005 7:56 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7716

This info is published on slide 30 of the Administrating MQ Clusters session (M29) from the IBM Tech Conferances.

Quote:

CorrelID in MQMD added on transmission queue will contain the name of the channel thatthe message should be sent down.

_________________
Peter Potkay
Keep Calm and MQ On

Nigelg

Posted: Wed Feb 23, 2005 8:24 am Post subject:

Grand Master

Joined: 02 Aug 2004
Posts: 1046

DIS CHS(*) looks in the existing channel table, so it is not the right command to find out what state the qmgr thinks the cluster channel is in - this is kept in the cluster cache. Use DIS CLUSQMGR(*) ALL instead, and look at the STATUS attribute,

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Clustering » Auto CLUSSDR channels not starting up!

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP