Author |
Message
|
Bad |
Posted: Tue Jun 15, 2021 7:44 am Post subject: Switch QM between nominal / rescue |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
Hello ,
I would like your opinion on a current problem:
Following an incident on a server (full fs) and incident resolved the switching time of a QM from a nominal server to its rescue and increased from 5 min to 1 hour. Do you have any idea of the parameters to check?
Thank you |
|
Back to top |
|
 |
hughson |
Posted: Tue Jun 15, 2021 3:03 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
By rescue server, do you mean you failed over the queue manager?
You say something increased from 5 minutes to 1 hour. Could you tell us what that was?
You are asking for some parameters to check, but I don't think you have told us enough for us to understand the problem you are suffering yet. Until we know that, we are unlikely to be able to provide advice.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
Bad |
Posted: Wed Jun 16, 2021 1:13 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
Hello
Sorry for the little explanation, and thank you for your feedback
I have 2 server
server 1 -> 2QM
QM A (running)
QM BS (running as standby)
server 2 -> 2QM
QM AS (running as standby)
QM B (running)
When I do my endmqm -s on the QMA of server 1 to switch to the QM AS its backup on server 2, this operation which following the incident now takes 1 hour. (before incident 5 min)
Hoping that this is a little clearer
cordially |
|
Back to top |
|
 |
hughson |
Posted: Wed Jun 16, 2021 4:07 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Just to confirm I am understanding your issue. Am I correct that when you have a full file system, you are seeing it take 1 hour to end your queue manager and fail over to the standby machine?
You also say that before the incident it takes 5 minutes. I assume you mean that if there is no issue then endmqm -s takes 5 minutes.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
exerk |
Posted: Wed Jun 16, 2021 12:23 pm Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
And another question - which file system was full? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
Bad |
Posted: Wed Jun 16, 2021 12:25 pm Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
Hello
thanks for your return
We had an incident on the server 3 months ago (fs full) we solved it
Before this incident an endmqm -s took 5 min to switch the server QM
After the incident (no more fs full) everything is ok we see that the endmqm -s takes about 1 hour or more and I would like the endmqm -s to return to 5 min to switch.
cordially |
|
Back to top |
|
 |
exerk |
Posted: Wed Jun 16, 2021 12:27 pm Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
Bad wrote: |
Hello
thanks for your return
We had an incident on the server 3 months ago (fs full) we solved it
Before this incident an endmqm -s took 5 min to switch the server QM
After the incident (no more fs full) everything is ok we see that the endmqm -s takes about 1 hour or more and I would like the endmqm -s to return to 5 min to switch.
cordially |
Again, cordially, which file system was full? Is the queue manager in a queue manager cluster, because if not I'll move this thread to a more appropriate forum. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
Bad |
Posted: Wed Jun 16, 2021 12:30 pm Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
exerk wrote: |
And another question - which file system was full? |
It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB
the cause was a partner who continued to send messages to a partner who closed his database for 3 days ... |
|
Back to top |
|
 |
hughson |
Posted: Wed Jun 16, 2021 2:45 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
During the incident, did your queue manager end abnormally? I.e. did it have a lot of log data to replay upon next start up?
Does it still take 1 hour to fail over now?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
Bad |
Posted: Thu Jun 17, 2021 6:18 am Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
thanks for your return
During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs and then restarted the QM with a strmqm -c QMA to rebuild it and then a strmqm -x QMA to restart it.
->"WebSphere MQ queue manager 'QMA' starting.
The queue manager is associated with installation 'Installation2'.
473 log records accessed on queue manager 'QMA' during the log replay phase.
Log replay for queue manager 'QMA4' complete.
Transaction manager state recovered for queue manager 'QMA'.
Creating or replacing default objects for queue manager 'QMA'"
"did it have a lot of log data to replay upon next start up?" -> If it's data is in the system cluster command queue it has never been replayed as we deleted and rebuilt it
"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now |
|
Back to top |
|
 |
exerk |
Posted: Thu Jun 17, 2021 6:18 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
Bad wrote: |
exerk wrote: |
And another question - which file system was full? |
It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB
the cause was a partner who continued to send messages to a partner who closed his database for 3 days ... |
What is your OS and MQ version please? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
exerk |
Posted: Thu Jun 17, 2021 6:22 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
Bad wrote: |
...During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs... |
Why, oh why, did you think this was a good idea? Why not expand the file system, or better still stop the queue manager first, then expand the file system?
And you have still to answer my question as to whether the queue manager is in a cluster of queue managers. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
Bad |
Posted: Thu Jun 17, 2021 1:10 pm Post subject: |
|
|
Novice
Joined: 15 Jun 2021 Posts: 14
|
exerk wrote: |
Bad wrote: |
...During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs... |
Why, oh why, did you think this was a good idea? Why not expand the file system, or better still stop the queue manager first, then expand the file system?
And you have still to answer my question as to whether the queue manager is in a cluster of queue managers. |
I am only an operator, I only followed the instructions given by the expert. I do not have the level for this type of operation but I am trying to learn and understand considering the context and the failure caused, it may have been the fastest I don't know.
Yes the queue manager is in a cluster of queue manager , sorry i'm french and i'm not bilingual i may miss some sentences |
|
Back to top |
|
 |
gbaddeley |
Posted: Thu Jun 17, 2021 3:22 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
Bad wrote: |
I am only an operator, I only followed the instructions given by the expert. |
Can you bring your MQ expert into this chat?
If the cluster command queue was using 3GB of disk space, there must have been an issue for a long time (days? weeks?), with cluster command processing not happening.
Full file systems are very bad for MQ.
The cluster command queue depth is normally zero.
Do you have monitoring and alerting on the file systems and on queue depths? _________________ Glenn |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Jun 17, 2021 5:59 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
gbaddeley wrote: |
Full file systems are very bad for MQ. |
If the active/standby see the same filesystem, the standby attempting to take ownership will likely encounter the same issue as the primary.
gbaddeley wrote: |
The cluster command queue depth is normally zero.
|
... or decrementing toward zero as cluster commands are processed. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
|