Author |
Message
|
sysera |
Posted: Fri Jul 29, 2005 4:20 am Post subject: Cannot Put To Clustered Queue |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
Hey all,
I'm sure this has been covered before, and I've seen some threads that resemble the symptoms I am seeing, but none of them seem 100% clear to me. I'll post my new situation and any help anyone can provide would be great.
Here's the configuration:
APPSERVER01 - Partial Repository
APPSERVER02 - Partial Repository
WEBSERVER01 - Full Repository
WEBSERVER02 - Full Repository
All are SUSE Linux 8, running MQ 5.3 FP7.
APPSERVER01 had previously had some MQ application errors, internal errors, etc. Nothing seemed to fix it, even a restore of the queue manager definitions created with (SaveQMGR). So we ended up re-installing the MQ RPMs and all the errors went away.
Now there is an alias queue on APPSERVER01 and APPSERVER01 called MYQUEUE.OUT which resolves to a local queue on WEBSERVER01 and WEBSERVER02 called QL.WEBSERVER.OUT
I can still put to the queue alias via APPSERVER02 successfully and all is well. I cannot put to the alias via APPSERVER01 however. It returns a dreaded 2082, MQRC_UNKNOWN_ALIAS_BASE_Q, error.
I'm under the impression that the cluster is remembering some old information about the previous queue manager of the same name that was replaced when trying to fix our application errors.
What is the best process to remove this queue manager from the cluster and re-introduce it successfully so all of the automatic channels it creates, etc work correctly?
Is there a way I can tell the full repository queue managers to "re-sync" with this queue manager and see what's up?
Thanks everyone. This forum has been a great deal of help to me when trying to learn this new application.
-Sys |
|
Back to top |
|
 |
Mr Butcher |
Posted: Fri Jul 29, 2005 4:41 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
did you purge and re-create the appserver01 queuemanager (this is not clear to me from your posting)
if not, it is still the same queuemanager with the same queuemanagerid and should be able to participate in the cluster. check the full repositories what is known about appserver01. maybe a "refresh cluster" on appserver01 could help to bring the repositories up-to-date.
if the queuemanager was purged and recreated, it now has a different qmgr-id (which is used in the cluster). now you should remove the "old"
appserver01 queuemanager from the full repositories first (reset cluster), then re-join the cluster from appserver01 (the manual says you should wait at least 10 seconds after the reset).
you should check first what is known in the full repository about appserver01, and also on appserver01 what it thinks about its status
in the cluster....... _________________ Regards, Butcher |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 4:52 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
Mr Butcher wrote: |
did you purge and re-create the appserver01 queuemanager (this is not clear to me from your posting)
if not, it is still the same queuemanager with the same queuemanagerid and should be able to participate in the cluster. check the full repositories what is known about appserver01. maybe a "refresh cluster" on appserver01 could help to bring the repositories up-to-date.
if the queuemanager was purged and recreated, it now has a different qmgr-id (which is used in the cluster). now you should remove the "old"
appserver01 queuemanager from the full repositories first (reset cluster), then re-join the cluster from appserver01 (the manual says you should wait at least 10 seconds after the reset).
you should check first what is known in the full repository about appserver01, and also on appserver01 what it thinks about its status
in the cluster....... |
The original queue manager was completely removed and restored from the backed up definitions, I assume this would mean it's entirely new to the cluster, a new ID etc.
I'm going to spend some (more) time today reading through the good old manual to see if I can make some sense of the entire process I need to follow. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Fri Jul 29, 2005 5:03 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
Quote: |
I assume this would mean it's entirely new to the cluster, a new ID etc. |
thats it. doing some rtfm is good in any case, but the procedure to follow
is not that complicated as it sounds.
make the full repositories forget about appserver01 and its queues by using the "reset cluster" command
if required, make partial repositories forgett about the old appserver01 by using "refresh cluster" . i think this should not be required in your case because appserver01 did not communicate with appserver02
check that the old queuemanager is no longer known
re-join the cluster with your new queuemanager
this could have been a weekends procedure in the days clustering was new, today its almost only fun.  _________________ Regards, Butcher |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 5:25 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
Mr Butcher wrote: |
Quote: |
I assume this would mean it's entirely new to the cluster, a new ID etc. |
thats it. doing some rtfm is good in any case, but the procedure to follow
is not that complicated as it sounds.
make the full repositories forget about appserver01 and its queues by using the "reset cluster" command
if required, make partial repositories forgett about the old appserver01 by using "refresh cluster" . i think this should not be required in your case because appserver01 did not communicate with appserver02
check that the old queuemanager is no longer known
re-join the cluster with your new queuemanager
this could have been a weekends procedure in the days clustering was new, today its almost only fun.  |
It seems I was able to make WEBSERVER02 drop APPSERVER01, it has no channels running to it and no longer shows up when I issue a dis clusqmgr(*), but it still has a channel on WEBSERVER01, and it shows up several times when I issue the dis clusqmgr(*) command. |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 5:28 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
Also,
It appears the SYSTEM.CLUSTER.TRANSMIT.QUEUE on WEBSERVER01 also has a curdepth of 11. Is it possible these definitions aren't leaving this full repository because there is data in this queue to be sent to the queue manager APPSERVER01 that it cannot resolve? |
|
Back to top |
|
 |
Mr Butcher |
Posted: Fri Jul 29, 2005 5:52 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
Quote: |
It seems I was able to make WEBSERVER02 drop APPSERVER01, it has no channels running to it and no longer shows up when I issue a dis clusqmgr(*), but it still has a channel on WEBSERVER01, and it shows up several times when I issue the dis clusqmgr(*) command. |
thq qmid is shown in that case too and you should be able to see whether these are all old or new entries. you can use the reset cluster togehter with the qmid to get rid of the old qmgr.
try to stop the channels to appserver01, or shut down appserver01 if possible during these actions. check if the channels are indoubt, if so, resolve that first.
when all channels are inactive, try to remove from webserver01. it should work.
Quote: |
It appears the SYSTEM.CLUSTER.TRANSMIT.QUEUE on WEBSERVER01 also has a curdepth of 11. Is it possible these definitions aren't leaving this full repository because there is data in this queue to be sent to the queue manager APPSERVER01 that it cannot resolve? |
hard to say. this could either be cluster related messages or application messages.
try to get rid of the old appserver01, then check what is left over in the queue (i dont know if the reset cluster will also check the cluster transmit queue and remove cluster related messages from there).
if old cluster information for the old appserver01 will reach the new appserver01 you will get an error message and the cluster data will not be accepted.
application messages of course should be delivered...
good luck _________________ Regards, Butcher |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 6:45 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
I've been able to remove the queue manager from one repository it seems, but there is still one that is hanging on in the WEBSERVER01 repository. It doesn't want to disappear regardless if I forceremove it by QMID or QMNAME. Not sure why. |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 6:52 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
At this point I'm almost pondering recreating the cluster.
This is a production cluster that hasn't entered production yet, no data on it, etc. I have the original queue defintions to re-create it with, maybe it would simplify the process.
I would just restore the original APPSERVER01 queue manager, but the machine hadn't entered the backup schedule yet. Argh.  |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Jul 29, 2005 7:04 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
sysera wrote: |
I've been able to remove the queue manager from one repository it seems, but there is still one that is hanging on in the WEBSERVER01 repository. It doesn't want to disappear regardless if I forceremove it by QMID or QMNAME. Not sure why. |
Take WEBSERVER01 out of the cluster, refresh the cluster, and add it again. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 7:07 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
jefflowrey wrote: |
sysera wrote: |
I've been able to remove the queue manager from one repository it seems, but there is still one that is hanging on in the WEBSERVER01 repository. It doesn't want to disappear regardless if I forceremove it by QMID or QMNAME. Not sure why. |
Take WEBSERVER01 out of the cluster, refresh the cluster, and add it again. |
Hmm. I shall give that a try. Thanks.  |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 7:34 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
sysera wrote: |
jefflowrey wrote: |
sysera wrote: |
I've been able to remove the queue manager from one repository it seems, but there is still one that is hanging on in the WEBSERVER01 repository. It doesn't want to disappear regardless if I forceremove it by QMID or QMNAME. Not sure why. |
Take WEBSERVER01 out of the cluster, refresh the cluster, and add it again. |
Hmm. I shall give that a try. Thanks.  |
This seems to have done the trick as far as removing that instance of the APPSERVER01 queue manager. Now if I can just get it to re-join.  |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 9:51 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
sysera wrote: |
sysera wrote: |
jefflowrey wrote: |
sysera wrote: |
I've been able to remove the queue manager from one repository it seems, but there is still one that is hanging on in the WEBSERVER01 repository. It doesn't want to disappear regardless if I forceremove it by QMID or QMNAME. Not sure why. |
Take WEBSERVER01 out of the cluster, refresh the cluster, and add it again. |
Hmm. I shall give that a try. Thanks.  |
This seems to have done the trick as far as removing that instance of the APPSERVER01 queue manager. Now if I can just get it to re-join.  |
I gave it some time to see if it would re-join the rest of the machines or at least the full repositories in the cluster but so far no sign of it. I'm not sure what else I can do to make it join up successfully. From the build script I have we used when we created it I don't think anything needs to be redefined. Any suggestions? Thanks guys. |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Jul 29, 2005 9:54 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Which QM is not in the cluster any more? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
sysera |
Posted: Fri Jul 29, 2005 9:57 am Post subject: |
|
|
Acolyte
Joined: 20 May 2005 Posts: 53
|
jefflowrey wrote: |
Which QM is not in the cluster any more? |
APPSERVER01 has been removed from the cluster.
It's now showing up again in WEBSERVER01, but not in WEBSERVER02, I however still can't put to that cluster queue alias. I also tried adding a new alias and that doesn't work either.
I did run a refresh cluster(clustername) on the full repositories as well to see if that would help. |
|
Back to top |
|
 |
|