ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » SOLVED: Cluster transmit queue problems for a certain QM

Post new topic  Reply to topic
 SOLVED: Cluster transmit queue problems for a certain QM « View previous topic :: View next topic » 
Author Message
gs
PostPosted: Mon Sep 24, 2007 6:37 am    Post subject: SOLVED: Cluster transmit queue problems for a certain QM Reply with quote

Master

Joined: 31 May 2007
Posts: 254
Location: Sweden

Hi!

We're having a problem with a certain queue manager in an SSL'ED cluster.
Messages are received on QM1 and passed on to QM2. Repository qm's are QM3 and QM4.
The message(s) stays on the SYSTEM.CLUSTER.TRANSMIT.QUEUE on QM1 until i manually start the cluster channel (for QM2) on QM1.
The cluster is fully functional apart from this and the problematic QM2 was re-installed just recently. Nothing related to this in any logs.

Any ideas for problem solving?

QM1 specs (W2k3):
Name: WebSphere MQ
Version: 530.5 CSD05
CMVC level: p530-05-L030926
BuildType: IKAP - (Production)

QM2 specs (SuSE Linux):
Name: WebSphere MQ
Version: 530.11 CSD11
CMVC level: p530-11-L050802
BuildType: IKAP - (Production)


Last edited by gs on Fri Sep 28, 2007 12:17 am; edited 1 time in total
Back to top
View user's profile Send private message
Vitor
PostPosted: Mon Sep 24, 2007 6:53 am    Post subject: Re: Cluster transmit queue problems for a certain QM Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

gs wrote:
QM1 specs (W2k3):
Name: WebSphere MQ
Version: 530.5 CSD05
CMVC level: p530-05-L030926
BuildType: IKAP - (Production)


Considered applying some maintenance? You could have hit some kind of compatability problem with the levels being so far apart.

In any event, that's very old as CSDs go.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Mr Butcher
PostPosted: Mon Sep 24, 2007 7:32 am    Post subject: Reply with quote

Padawan

Joined: 23 May 2005
Posts: 1716

if qm2 was re-installed it gets a new queuemanager-id which is used in the cluster (even if you created the new queuemanager with the old name).

did you remove the old queuemanagerid? make sure, every queuemanager in the cluster knows the "new" queuemanager, not the old one.
_________________
Regards, Butcher
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Sep 24, 2007 5:09 pm    Post subject: Re: Cluster transmit queue problems for a certain QM Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

gs wrote:
i manually start the cluster channel (for QM2) on QM1.


Do you have a manually defined CLUSSNDR channel on QM1 pointing at QM2? Is that what you are starting? If yes, that's wrong in of itself. If the Full Repositories are QM3 and QM4 then all QMs in the cluster should only have 1 manually defined CLUSSNDR channel and it should point to either QM3 or QM4.

I suspect Mr Butcher is correct. Your cluster (or at least QM1) does not appear to know about the new QM2. It may stilll know about the old QM2. Look up QM IDs in the cluster manual.

Make sure QM2's CLUSRCVR is 100% correct.
Make sure QM2's CLUSSNDR (its one and only CLUSSNDR!) is pointing at QM3 or QM4.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
gs
PostPosted: Mon Sep 24, 2007 11:06 pm    Post subject: Reply with quote

Master

Joined: 31 May 2007
Posts: 254
Location: Sweden

I forgot to mention it, but I checked the QMID on all machines involved prior to the post. Before re-installing the QM2 it was also suspended out of the cluster.

Yes, I manually start the channel but it's a cluster channel (propagated from the repository), not a defined channel. The only defined channels are as supposed to - sender/receiver from QM2 to repo.

Can an erraneous QMID affect the cluster in such a way that automatic triggering stops working - shouldn't the communication be completely dysfuntional?
Back to top
View user's profile Send private message
Vitor
PostPosted: Mon Sep 24, 2007 11:11 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

If it was me, I'd certainly be inclined to remove QM2 from the cluster (carefully following the steps in the manual) and re-add it once I was sure all reference to it was expunged. My estemed fellow posters have hit on a likely explaination, though I'd still think on some maintenance medium term!
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Tue Sep 25, 2007 2:25 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Use
Code:
reset cluster(mycluster) action(forceremove) qmid(qmid of bad qmgr) queues(yes).

And do it in a full repository. This should expunge the cluster from the qmgr that you removed and avoid any problems with it's namesake.

Basically what might happen here is that the cluster tries to deliver the message to the removed QMID qmgr and the channels don't match etc....

Enjoy
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
gs
PostPosted: Tue Sep 25, 2007 5:11 am    Post subject: Reply with quote

Master

Joined: 31 May 2007
Posts: 254
Location: Sweden

Thanks for all your input. I removed the QM completely out of the cluster and made sure all definitions were gone. Then i recreated the cluster channels, put the queues back into the cluster and fired up all channels. So far so good.

It could have been that the last person doing the re-install didn't remove QM1 from the cluster properly causing duplicate or jibberish info about QM2 in the repos. Unfortunately (?) I don't have any more stuck SCTQ messages to investigate, but I prefer having the setup work.

Quick closing question, how does the XQH in SCTQ messages refer to QMNAME, QMID and the correct cluster channel?

Any recommendations for further reading on how things like this work technically/mq internally?
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Tue Sep 25, 2007 10:29 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

It doesn't! It just has a destination qmgr name. The cluster manager process on the qmgr starts the channel according to the information in the repository about that qmgr.

This is why it is so important that there be only 1 qmgr in the cluster with the same name. This is why I was asking you to get rid of the information of the qmgr with the wrong qmid...

Enjoy
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
gs
PostPosted: Wed Sep 26, 2007 4:52 am    Post subject: Reply with quote

Master

Joined: 31 May 2007
Posts: 254
Location: Sweden

Unfortunately, the problem still persists but atleast this it gives me some more information. The message on SCTQ, contain QM2 correctly and the correlaton ID is the correct channel name (in hex).

I dumped all the cluster queue manager using amqrfdm on QM1, QM3 & QM4 which listed the qm with duplicate entries - both new and old QMID. HOWEVER, all old entries are listed as 'Deleted' and doing a reset doesn't help this.

Code:
Qm(QM2                                          ) Deleted Seq(1154592274)
 Channel(TO.K003.QM2   ) Stopped      ChlSeq(603649)
 @DD6C     Clusters @DC50   
 Desc($RCSfile$ $Revision: 25351 $                                    )
 UUID(QM2_2005-07-04_16.05.20                      )
  Flags(CLUSSDR Auto Joined Refresh )
 Flags(1) MsgId(414D5120564343303134202020202020F53AED462016FB46)
 Prev(0       ) nQmgr(1014110 ) nUUID(0       ) nCh(1014110 ) Ascii(D414    )
 Cluster(K003                                         ) Deleted Seq(1154592274)
  @DC50     Next(0       )
  Exp(10/04/2007 08:03:53 PM) Upd(09/12/2007 09:28:24 AM)
  Flags(CLUSSDR Auto Joined )


Do notice that there's an expiration date (oct 04) that hasn't occured, is this a problem? Doing a DIS CLUSQMGR(QM2) only lists the new QMID.

In case this IS a problem, can I force QM1, QM3 & QM4 to expire the information immediately?
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Wed Sep 26, 2007 4:55 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

fjb_saper wrote:
Use
Code:
reset cluster(mycluster) action(forceremove) qmid(qmid of bad qmgr) queues(yes).

And do it in a full repository. This should expunge the cluster from the qmgr that you removed and avoid any problems with it's namesake.

Basically what might happen here is that the cluster tries to deliver the message to the removed QMID qmgr and the channels don't match etc....

Enjoy

_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
gs
PostPosted: Wed Sep 26, 2007 6:16 am    Post subject: Reply with quote

Master

Joined: 31 May 2007
Posts: 254
Location: Sweden

I've already tried this but without success. I guess the command only modifies the entry to 'deleted', not actually deletes it.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Sep 26, 2007 10:26 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

If need be, move the cluster briefly to a single FR. Do the cluster reset on the FULL FR.
On the 2nd FR now being a PR do:
Code:
refresh cluster(mycluster) repos(yes)


Then check that the reference for the QMID you see as deleted is gone. Make it a full rep again...

and you might have to repeat the refresh cluster repos on any PR still having a reference to the old QMID.

Enjoy
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
gs
PostPosted: Fri Sep 28, 2007 12:14 am    Post subject: Reply with quote

Master

Joined: 31 May 2007
Posts: 254
Location: Sweden

The problem is now solved. This is what I did:

There's two identical instances of QM1 which both runs a message broker. The problem appeared on the exact same way on both brokers.

3 days ago I restarted the MQ & Broker services on broker1 and waited, ok everything went fine after that. Yesterday I restarted the broker ONLY on the second machine (broker2). Today I checked the logs and the SCTQ and amazing enough, the problem was solved.

So why och WHY can the broker affect this? Shouldn't this strictly be an MQ error?

Thanks once again for all your help!
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Sep 28, 2007 2:17 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

You do remember that starting the broker will start its qmgr if it isn't already running...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » Clustering » SOLVED: Cluster transmit queue problems for a certain QM
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.