Author |
Message
|
gs |
Posted: Mon Sep 24, 2007 6:37 am Post subject: SOLVED: Cluster transmit queue problems for a certain QM |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
Hi!
We're having a problem with a certain queue manager in an SSL'ED cluster.
Messages are received on QM1 and passed on to QM2. Repository qm's are QM3 and QM4.
The message(s) stays on the SYSTEM.CLUSTER.TRANSMIT.QUEUE on QM1 until i manually start the cluster channel (for QM2) on QM1.
The cluster is fully functional apart from this and the problematic QM2 was re-installed just recently. Nothing related to this in any logs.
Any ideas for problem solving?
QM1 specs (W2k3):
Name: WebSphere MQ
Version: 530.5 CSD05
CMVC level: p530-05-L030926
BuildType: IKAP - (Production)
QM2 specs (SuSE Linux):
Name: WebSphere MQ
Version: 530.11 CSD11
CMVC level: p530-11-L050802
BuildType: IKAP - (Production)
Last edited by gs on Fri Sep 28, 2007 12:17 am; edited 1 time in total |
|
Back to top |
|
 |
Vitor |
Posted: Mon Sep 24, 2007 6:53 am Post subject: Re: Cluster transmit queue problems for a certain QM |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
gs wrote: |
QM1 specs (W2k3):
Name: WebSphere MQ
Version: 530.5 CSD05
CMVC level: p530-05-L030926
BuildType: IKAP - (Production)
|
Considered applying some maintenance? You could have hit some kind of compatability problem with the levels being so far apart.
In any event, that's very old as CSDs go. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Mon Sep 24, 2007 7:32 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
if qm2 was re-installed it gets a new queuemanager-id which is used in the cluster (even if you created the new queuemanager with the old name).
did you remove the old queuemanagerid? make sure, every queuemanager in the cluster knows the "new" queuemanager, not the old one. _________________ Regards, Butcher |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Sep 24, 2007 5:09 pm Post subject: Re: Cluster transmit queue problems for a certain QM |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
gs wrote: |
i manually start the cluster channel (for QM2) on QM1. |
Do you have a manually defined CLUSSNDR channel on QM1 pointing at QM2? Is that what you are starting? If yes, that's wrong in of itself. If the Full Repositories are QM3 and QM4 then all QMs in the cluster should only have 1 manually defined CLUSSNDR channel and it should point to either QM3 or QM4.
I suspect Mr Butcher is correct. Your cluster (or at least QM1) does not appear to know about the new QM2. It may stilll know about the old QM2. Look up QM IDs in the cluster manual.
Make sure QM2's CLUSRCVR is 100% correct.
Make sure QM2's CLUSSNDR (its one and only CLUSSNDR!) is pointing at QM3 or QM4. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
gs |
Posted: Mon Sep 24, 2007 11:06 pm Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
I forgot to mention it, but I checked the QMID on all machines involved prior to the post. Before re-installing the QM2 it was also suspended out of the cluster.
Yes, I manually start the channel but it's a cluster channel (propagated from the repository), not a defined channel. The only defined channels are as supposed to - sender/receiver from QM2 to repo.
Can an erraneous QMID affect the cluster in such a way that automatic triggering stops working - shouldn't the communication be completely dysfuntional? |
|
Back to top |
|
 |
Vitor |
Posted: Mon Sep 24, 2007 11:11 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
If it was me, I'd certainly be inclined to remove QM2 from the cluster (carefully following the steps in the manual) and re-add it once I was sure all reference to it was expunged. My estemed fellow posters have hit on a likely explaination, though I'd still think on some maintenance medium term! _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Sep 25, 2007 2:25 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Use
Code: |
reset cluster(mycluster) action(forceremove) qmid(qmid of bad qmgr) queues(yes). |
And do it in a full repository. This should expunge the cluster from the qmgr that you removed and avoid any problems with it's namesake.
Basically what might happen here is that the cluster tries to deliver the message to the removed QMID qmgr and the channels don't match etc....
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gs |
Posted: Tue Sep 25, 2007 5:11 am Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
Thanks for all your input. I removed the QM completely out of the cluster and made sure all definitions were gone. Then i recreated the cluster channels, put the queues back into the cluster and fired up all channels. So far so good.
It could have been that the last person doing the re-install didn't remove QM1 from the cluster properly causing duplicate or jibberish info about QM2 in the repos. Unfortunately (?) I don't have any more stuck SCTQ messages to investigate, but I prefer having the setup work.
Quick closing question, how does the XQH in SCTQ messages refer to QMNAME, QMID and the correct cluster channel?
Any recommendations for further reading on how things like this work technically/mq internally? |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Sep 25, 2007 10:29 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
It doesn't! It just has a destination qmgr name. The cluster manager process on the qmgr starts the channel according to the information in the repository about that qmgr.
This is why it is so important that there be only 1 qmgr in the cluster with the same name. This is why I was asking you to get rid of the information of the qmgr with the wrong qmid...
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gs |
Posted: Wed Sep 26, 2007 4:52 am Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
Unfortunately, the problem still persists but atleast this it gives me some more information. The message on SCTQ, contain QM2 correctly and the correlaton ID is the correct channel name (in hex).
I dumped all the cluster queue manager using amqrfdm on QM1, QM3 & QM4 which listed the qm with duplicate entries - both new and old QMID. HOWEVER, all old entries are listed as 'Deleted' and doing a reset doesn't help this.
Code: |
Qm(QM2 ) Deleted Seq(1154592274)
Channel(TO.K003.QM2 ) Stopped ChlSeq(603649)
@DD6C Clusters @DC50
Desc($RCSfile$ $Revision: 25351 $ )
UUID(QM2_2005-07-04_16.05.20 )
Flags(CLUSSDR Auto Joined Refresh )
Flags(1) MsgId(414D5120564343303134202020202020F53AED462016FB46)
Prev(0 ) nQmgr(1014110 ) nUUID(0 ) nCh(1014110 ) Ascii(D414 )
Cluster(K003 ) Deleted Seq(1154592274)
@DC50 Next(0 )
Exp(10/04/2007 08:03:53 PM) Upd(09/12/2007 09:28:24 AM)
Flags(CLUSSDR Auto Joined ) |
Do notice that there's an expiration date (oct 04) that hasn't occured, is this a problem? Doing a DIS CLUSQMGR(QM2) only lists the new QMID.
In case this IS a problem, can I force QM1, QM3 & QM4 to expire the information immediately? |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Sep 26, 2007 4:55 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
fjb_saper wrote: |
Use
Code: |
reset cluster(mycluster) action(forceremove) qmid(qmid of bad qmgr) queues(yes). |
And do it in a full repository. This should expunge the cluster from the qmgr that you removed and avoid any problems with it's namesake.
Basically what might happen here is that the cluster tries to deliver the message to the removed QMID qmgr and the channels don't match etc....
Enjoy  |
_________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
gs |
Posted: Wed Sep 26, 2007 6:16 am Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
I've already tried this but without success. I guess the command only modifies the entry to 'deleted', not actually deletes it. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Sep 26, 2007 10:26 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
If need be, move the cluster briefly to a single FR. Do the cluster reset on the FULL FR.
On the 2nd FR now being a PR do:
Code: |
refresh cluster(mycluster) repos(yes) |
Then check that the reference for the QMID you see as deleted is gone. Make it a full rep again...
and you might have to repeat the refresh cluster repos on any PR still having a reference to the old QMID.
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gs |
Posted: Fri Sep 28, 2007 12:14 am Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
The problem is now solved. This is what I did:
There's two identical instances of QM1 which both runs a message broker. The problem appeared on the exact same way on both brokers.
3 days ago I restarted the MQ & Broker services on broker1 and waited, ok everything went fine after that. Yesterday I restarted the broker ONLY on the second machine (broker2). Today I checked the logs and the SCTQ and amazing enough, the problem was solved.
So why och WHY can the broker affect this? Shouldn't this strictly be an MQ error?
Thanks once again for all your help! |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Sep 28, 2007 2:17 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
You do remember that starting the broker will start its qmgr if it isn't already running...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|