Author |
Message
|
KIT_INC |
Posted: Mon Feb 04, 2019 9:16 pm Post subject: SYSTEM.CLUSTER.COMMAND.QUEUE filled up |
|
|
Knight
Joined: 25 Aug 2006 Posts: 589
|
I am running MQ75. I have some cluster issue and I found one of the PR QMgr has 27K messages on SYSTEM.CLUSTER.COMMAND.QUEUE
The info center says:
"Each queue manager in a cluster has a local queue called SYSTEM.CLUSTER.COMMAND.QUEUE which is used to transfer messages to the full repository. The message contains any new or changed information about the queue manager, or any requests for information about other queue managers. SYSTEM.CLUSTER.COMMAND.QUEUE is normally empty."
I am trying to understand what this means. The SYSTEM.CLUSTER.COMMAND.QUEUE is a local queue (not xmitq). How can you use a local queue to transfer messages to the full repository ??
Does it mean, it contains messages to be transmitted to FR ?
Our cluster has 1000+ qmgrs. I am wondering what could have caused 27K messages to be there ? unfortunately, our production support has taken the QMgr off the cluster and remote all the messages, So I can not see what are the 27K messages. |
|
Back to top |
|
 |
hughson |
Posted: Tue Feb 05, 2019 1:20 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
The SYSTEM.CLUSTER.COMMAND.QUEUE contains messages with information for the repository task, a process within the queue manager. If your cluster command queue had 27K messages on it, that suggests the repository task for that queue manager was not running.
The messages on this queue were not to be transmitted to the FR, they were from the FRs destined for this queue manager's repository task.
There should be errors in the AMQERR01.LOG showing the reason why the repository task had failed.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
KIT_INC |
Posted: Tue Feb 05, 2019 7:01 am Post subject: |
|
|
Knight
Joined: 25 Aug 2006 Posts: 589
|
Quote: |
The messages on this queue were not to be transmitted to the FR, they were from the FRs destined for this queue manager's repository task. |
Thanks, I'll do my home work on the error logs. But what you said changed my thinking on the issue.
Let me explain.
The issue was started with FR run out of disk space. The chl from all PRs to FR went into retrying. It is our bad that no one react until a few hours (may be longer) later. Space were added and FR restarted. I believe some one did reset and refresh cluster. The reason was they got report from some PRs that they cannot see some cluster resources. You know sometime panic reaction to recover production issue.
When I read the info center which says messages are for FR, I thought that is the result of clussdr to FR in retry for a few hours.
But now , your explanation is these are messages from FR for the PR Qmgr to process (by the repository task). Why is the FR generating so many messages for a PR ? |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Feb 05, 2019 7:59 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
KIT_INC wrote: |
Why is the FR generating so many messages for a PR ? |
These are outbound cluster admin messages destined to the other FR and to interested PRs (and potentially all PRs), not just one PR. These admin message attempt to keep the cluster qmgrs cluster repositories in synchronization.
Insufficient disk space potentially impacts all queues, perhaps in this instance the inability to put new/updated repository definitions into the SCRQ, and outbound admin messages to the SCTQ.
As Morag suggests, please review each of your qmgrs error logs for related errors. Automation should be monitoring and alerting for disk and other resource constraints well in advance of their exhaustion. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
KIT_INC |
Posted: Tue Feb 05, 2019 11:11 am Post subject: |
|
|
Knight
Joined: 25 Aug 2006 Posts: 589
|
Quote: |
These are outbound cluster admin messages destined to the other FR and to all interested PRs, not just one PR. These admin message attempt to keep the cluster qmgrs cluster repositories in synchronization. |
Our cluster has 1000+ qmgrs (QM1 and QM2 are FR and QM3 -QM1000 are PR). I am seeing the 27K messages on one of the many PRs I am sure there are a few more PR that has high number of messages in SYSTEM.CLUSTER.COMMAND.QUEUE also at that time. But we did not spend the time to find them because we were focusing on fixing that one PR qmgr. It's probably too late to find those messages on other PR Qmgrs now because they should already be handled. But I'll try.
I look at the AMQERR0x.LOG, I only see messages on channel retrying to FR which we know is due to FR space issue. There is no error against the repo tasks. So it was processing messages on SYSTEM.CLUSTER.COMMAND.QUEUE but just not fast enough.
Please correct me if I am wrong. FR qmgr sends cluster admin messages to SYSTEM.CLUSTER.COMMAND.QUEUE of each PR qmgr for cluster admin and repo synchronization. Does each PR gets its own set of message from the FR ? If yes, why is this particular PR qmgr get so many messages ?
I also noticed a ton of messages in the SYSTEM.CLUSTER.TRANSMIT.QUEUE targeted for the FR qmgr. Does messages sent by FR to PR cluster command queue require a response from the PR back to FR ? |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Feb 05, 2019 11:58 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Cluster command queue depth should be zero or decreasing toward zero as the cluster command processor does its work.
FRs publish messages to its partner FR, and to PRs when cluster object object attributes are defined, altered, deleted.
Do you have automation of any kind that frequently creates alters, refreshes or deletes cluster objects? For example, do you automate REFRESH CLUSTER command?
One simple object attribute change at a PR will result in two outbound admin mags to the two FRs, which will result in both FRs sending admin msgs to each other and interested PRs. If you have lots of PRs and lots of updates, you will have lots of inbound admin msgs in the cluster command queue. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
hughson |
Posted: Tue Feb 05, 2019 2:43 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
KIT_INC wrote: |
I believe some one did reset and refresh cluster. |
This is likely the cause of the large numbers of messages being sent round the cluster. It appears that this one PR didn't process all the messages due to some problem, as yet unknown until you look in the AMQERR01.LOG for that PR, and the backlog is what you have seen on the command queue.
Here is a pertinent page in Knowledge Center that you should read:-
Clustering: Using REFRESH CLUSTER best practices, especially the sub-section Refreshing in a large cluster can affect performance and availability of the cluster
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Feb 06, 2019 6:54 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
KIT_INC wrote: |
I believe some one did reset and refresh cluster. |
How'd I miss that? Logs should confirm this. Tar and feathers should be prepared. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
cicsprog |
Posted: Fri Feb 15, 2019 9:55 am Post subject: |
|
|
Partisan
Joined: 27 Jan 2002 Posts: 347
|
Read warning about REFRESH cluster...especially for a site with 1000+ mqms. Using it is like MQ Admining like Rambo...don't. Browse your SYSTEM.CLUSTER.* queues. Message eye catchers with probably point to the misbehaving MQM(s).
If you have multiple clusters, you might want to consider splitting out your SCTQ. |
|
Back to top |
|
 |
mvic |
Posted: Fri Feb 15, 2019 12:27 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
Really the interesting point is, is the queue depth changing?
If it's growing and reducing, then your amqrrmfa (QM repository manager program) is processing its work. Probably no cause for concern.
If it's growing only, and never reducing, or is just staying the same, then probably your amqrrmfa is having some major trouble.
Look in qmgr error logs and system-wide errors dir (/var/mqm/errors on *ix). |
|
Back to top |
|
 |
|