Author |
Message
|
mqdev |
Posted: Tue Mar 08, 2005 7:50 am Post subject: Messages building up on SYSTEM.CLUSTER.COMMAND.QUEUE |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Hello,
MQ env: AIX 5.2, MQv5.3CSD05. Cluster with 1200 QMgrs
Problem:
On one of the QMs which is hosting a cluster queue which is heavily used by the rest of QMs in the cluster - the SYSTEM.CLUSTER.COMMAND.QUEUE is constantly building up. What is causing this buildup?
Any insight would be appreciated.
Thanks
-mqdev |
|
Back to top |
|
 |
mqdev |
Posted: Tue Mar 08, 2005 8:50 am Post subject: fixed... |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
I bounced the QM - and that seems to have fixed it. Looks like the amqrrmfa though running, is not doing its job. Restart fixed it. |
|
Back to top |
|
 |
mqdev |
Posted: Fri Mar 11, 2005 12:41 pm Post subject: SYSTEM.CLUSTER.COMMAND.QUEUE building up again |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Hello,
I thought bouncing the QM fixed this problem - but apparently it hasn't.
The Q continues to build up and I have found that only amqrrmfa (Repository Manager) process has opened this Q for Input ( IPPROCS attribute when I display QSTATUS). No process has opened this Q for output ( OPPROCS is always zero. The messages themselves are non-persistent and are 2 sizes - either 76 bytes or 272 bytes.
Does anyone know whats happening here?
Thanks
-mqdev |
|
Back to top |
|
 |
sebastianhirt |
Posted: Mon Mar 14, 2005 12:18 am Post subject: |
|
|
Yatiri
Joined: 07 Jun 2004 Posts: 620 Location: Germany
|
We had the same problem some month ago, and ended up rebuilding the cluster, which was a pain in the neck.
I would assume, that one of the Cluster (in the cluster command queu) commands is faulty.
On your own risk..( I have never tried it before, therefore no guarantees)
Assuming that you have at least 2 Full Repositories and the sick QM is one of them.
1. Make the sick one a non full repos qm.
2. Clear the CLUSTER.COMAND.QUEUE
3. Clear the CLUSTER.REPOSITORY.QUEUE
4. Do a Cluster Refresh
5. Make the (now hopefully healthy) QM a Full Repos again
Hope this helps. If you already solfed it... Let me know how you did it. This is definately a real interesting scenario.
cheers
Sebastian |
|
Back to top |
|
 |
mqdev |
Posted: Mon Mar 14, 2005 7:35 am Post subject: An update on the problem |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
The amqrrmfa process is dying upon QM restart. I tried starting it from command line with exact same args that it has when the QM starts - it again dies. The IPPROCs and OPPROCs are both showing 0 when I do a "dis ql(SYSTEM.CLUSTER.COMMAND.QUEUE)". However the messages keep piling up on the queue unabated.
Sebastian -
This QM is not a FR but hosts a Cluster Queue which is heavily used by all the other Queue Managers. We have already deleted and recreated this QM once and are still hitting the same problem. Any insight would be appreciated.
Thanks
-mqdev |
|
Back to top |
|
 |
sebastianhirt |
Posted: Mon Mar 14, 2005 8:10 am Post subject: |
|
|
Yatiri
Joined: 07 Jun 2004 Posts: 620 Location: Germany
|
Hi,
OK, lets go trough it and see what we find. Maybe we get out what's wrong.
Are there any Coredumps?
Is there anything in the log files?
If yes, what do they say?
cheers
Sebastian |
|
Back to top |
|
 |
mqdev |
Posted: Mon Mar 14, 2005 8:22 am Post subject: changes |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Sebastian,
I cannot touch this QM now as we are under an official moratorium (any changes to the QM have to be preapproved which as you can imagine is a fairly complicated process [need to explain to the bosses what,why, why now, etc etc.] - I would rather wait till the offical ban is over with in about 2 weeks. I will keep this forum updated in the interim - am also working with IBM on this prob.
Thanks
-mqdev |
|
Back to top |
|
 |
sebastianhirt |
Posted: Mon Mar 14, 2005 8:27 am Post subject: |
|
|
Yatiri
Joined: 07 Jun 2004 Posts: 620 Location: Germany
|
Yes I can imagine that this is a pain.
Anyway, I wish you good luck then!
cheers
Sebastian |
|
Back to top |
|
 |
guest |
Posted: Fri Mar 18, 2005 3:39 pm Post subject: |
|
|
Acolyte
Joined: 11 Aug 2003 Posts: 52
|
I have done this before and it worked like charm for me ..
if you observe your error logs , amqrrmfa will try to restart in 600 seconds for 10 times ... Also look for any common mqseries MQAPI return codes returned with the amqrrmfa crashing error message , if any , try to fix it based on the return code ..
If no clue & no return code , once you know that it crashed , go ahead and remove the first message of the queue ( amqrrmfa is crashing as it couldnt handle that mesasge ) - use one of the available mqseries utilities to just remove the first one of the lot ....
Look at your logs , see whether amqrrmfa stays up after it is restart after 600 seconds .. if not repeat the above procedure as necessary ...
Note this just recover you from the problem of crashing amqrrmfa , but there may be other cluster issues that you may experience ... Sign of amqrrmfa crashing itself indicates that your cluster has broken in some aspects , unless otherwise some intruding program tried to place messages on the COMMAND.QUEUE causing this issue.
Good luck.
--R |
|
Back to top |
|
 |
|