Author |
Message
|
hughson |
Posted: Thu Jun 17, 2021 9:12 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Bad wrote: |
"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now |
What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour? _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
Bad |
Posted: Mon Jun 21, 2021 5:30 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
hughson wrote: |
Bad wrote: |
"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now |
What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour? |
hank you for your return
I have nothing in the AMQERR01 during the switchover.
However when I switch with endmqm -s QMA the QMA goes into running as standby quickly it is the QMB which takes 1 hour or more to go from the STARTING state to RUNNING with a process (amqrrmfa) at 100% during this hour |
|
Back to top |
|
 |
bruce2359 |
Posted: Mon Jun 21, 2021 11:55 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
hughson wrote: |
What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour? |
bad wrote: |
I have nothing in the AMQERR01 during the switchover. |
Please be a bit more precise. Do you mean that AMQERR01 is empty? Nothing whatsoever is logged after the fail? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
hughson |
Posted: Mon Jun 21, 2021 1:31 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Bad wrote: |
hughson wrote: |
Bad wrote: |
"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now |
What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour? |
hank you for your return
I have nothing in the AMQERR01 during the switchover.
However when I switch with endmqm -s QMA the QMA goes into running as standby quickly it is the QMB which takes 1 hour or more to go from the STARTING state to RUNNING with a process (amqrrmfa) at 100% during this hour |
There will be various messages output to the AMQERR01.LOG in your qmgr directory as each of the processes that form the queue manager start up. Are you sure you are looking in the queue manager's error log?
amqrrmfa is the cluster repository manager. Could it be processing all the messages on the cluster command queue that you told us about.
Once it gets up and running is the cluster command queue finally empty?
If you switch over again does it take another hour?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
Bad |
Posted: Tue Jun 22, 2021 11:03 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
thanks for your return
"There will be various messages output to the AMQERR01.LOG in your qmgr directory as each of the processes that form the queue manager start up. Are you sure you are looking in the queue manager's error log? "
Yes we looked with the mqseries expert there is a lot of log but we did not find a severe / critical error in the file
amqrrmfa is the cluster repository manager. Could it be processing all the messages on the cluster command queue that you told us about.
"Once it gets up and running is the cluster command queue finally empty"?
No, the SYSTEM.CLUSTER.COMMAND.QUEUE is not empty for example we made a switch last Tuesday and we are currently at 206,966 messages it sometimes takes a week or more after a switch to empty (on some switch we went up to more of 1 million messages)
"If you switch over again does it take another hour?"
Yes it takes another hours |
|
Back to top |
|
 |
Bad |
Posted: Tue Jun 22, 2021 11:07 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
gbaddeley wrote: |
Can you bring your MQ expert into this chat? |
Sorry i cant |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Jun 22, 2021 2:37 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Bad wrote: |
gbaddeley wrote: |
Can you bring your MQ expert into this chat? |
Sorry i cant |
Are you sure you don't have a poison message in the cluster command queue?
You did not specify the size of your cluster, but it seems to me that you have a huge number of messages there...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
hughson |
Posted: Tue Jun 22, 2021 3:00 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Bad wrote: |
"Once it gets up and running is the cluster command queue finally empty"?
No, the SYSTEM.CLUSTER.COMMAND.QUEUE is not empty for example we made a switch last Tuesday and we are currently at 206,966 messages it sometimes takes a week or more after a switch to empty (on some switch we went up to more of 1 million messages) |
The issue with your cluster command queue would appear to be causing the issue with the startup time.
Your comment implies that the cluster command queue only gets lots of messages on it when you switch? Are you saying that the switch causes the large numbers of messages? Or do you have large numbers of messages on your cluster command queue regardless of when you switch?
Why do you have so many messages on your cluster command queue? Have you issued a refresh cluster or something like that? Perhaps more than once?
Is the depth increasing or decreasing? Normally the cluster command queue depth should be tending to zero. Having a million messages on there is not the normal expectation.
Can you tell us more about your cluster? How many queue managers? How many queues?
Do you have an idea of where all the messages on the cluster command queue are coming from? Are they all from one queue manager for example?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
Bad |
Posted: Sun Jun 27, 2021 6:32 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
thanks for your return
hughson wrote: |
Bad wrote: |
"Once it gets up and running is the cluster command queue finally empty"?
No, the SYSTEM.CLUSTER.COMMAND.QUEUE is not empty for example we made a switch last Tuesday and we are currently at 206,966 messages it sometimes takes a week or more after a switch to empty (on some switch we went up to more of 1 million messages) |
The issue with your cluster command queue would appear to be causing the issue with the startup time.
Your comment implies that the cluster command queue only gets lots of messages on it when you switch? Are you saying that the switch causes the large numbers of messages? Or do you have large numbers of messages on your cluster command queue regardless of when you switch?
Why do you have so many messages on your cluster command queue? Have you issued a refresh cluster or something like that? Perhaps more than once?
Is the depth increasing or decreasing? Normally the cluster command queue depth should be tending to zero. Having a million messages on there is not the normal expectation.
Can you tell us more about your cluster? How many queue managers? How many queues?
Do you have an idea of where all the messages on the cluster command queue are coming from? Are they all from one queue manager for example?
Cheers,
Morag |
"Your comment implies that the cluster command queue only gets lots of messages on it when you switch?" Yes
"Are you saying that the switch causes the large numbers of messages ?"
Yes
"Is the depth increasing or decreasing?" the depth is increasing now we have 1,843,000 messages in the SYSTEM.CLUSTER.COMMAND.QUEUE
"Why do you have so many messages on your cluster command queue? Have you issued a refresh cluster or something like that? Perhaps more than once?" We have not done a refresh cluster for more than 2 months to avoid having more messages
" How many queue managers?" 60
" How many queues? " 180
Do you have an idea of where all the messages on the cluster command queue are coming from? messages come from 2 FR |
|
Back to top |
|
 |
Bad |
Posted: Sun Jun 27, 2021 6:34 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
fjb_saper wrote: |
Bad wrote: |
gbaddeley wrote: |
Can you bring your MQ expert into this chat? |
Sorry i cant |
Are you sure you don't have a poison message in the cluster command queue?
You did not specify the size of your cluster, but it seems to me that you have a huge number of messages there...  |
How could I locate this poison message in my queue? |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Jun 27, 2021 11:37 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Set the max retry on the SYSTEM.CLUSTER.COMMAND.QUEUE and maybe set a BACKOUT queue and you should find the poison message on the backout queue. The processing of legitimate messages should also considerably speed up.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Bad |
Posted: Mon Jun 28, 2021 1:29 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
fjb_saper wrote: |
Set the max retry on the SYSTEM.CLUSTER.COMMAND.QUEUE and maybe set a BACKOUT queue and you should find the poison message on the backout queue. The processing of legitimate messages should also considerably speed up.  |
Thanks for your returns
I looked at the first 5000 messages of the commade.queue I have no retry on my messages
On one of my FR I noticed the presence of a G8 "cache switch" message in the first position in front of the RFQR message in the System.cluster.repository.queue is this a poison message? |
|
Back to top |
|
 |
bruce2359 |
Posted: Mon Jun 28, 2021 6:19 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Have you or your expert opened a PMR with IBM? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
bruce2359 |
Posted: Mon Jun 28, 2021 8:34 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Bad wrote: |
exerk wrote: |
And another question - which file system was full? |
It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB
the cause was a partner who continued to send messages to a partner who closed his database for 3 days ... |
A closed/unavailable database should not normally result in cluster administrative messages.
How many transactions per day comprise 3 days workload? What is the length of the transaction messages? What does the app in question do when it discovers that the db is not available to complete a transaction? Does the app put-inhibit an application cluster-queue for each app message, for example? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
gbaddeley |
Posted: Mon Jun 28, 2021 3:27 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
bruce2359 wrote: |
Have you or your expert opened a PMR with IBM? |
Agree. You first need to investigate and resolve the issue with a ridiculously large number of messages on the SCCQ. Not only is the cluster command process unable to process them at a reasonable rate, but something is generating a massive flood of them as well.
Is there 1 IPPROC on the queue, by process amqrrmfa?
Does the first message have a non-zero BackoutCount? _________________ Glenn |
|
Back to top |
|
 |
|