MQSeries.net :: View topic - Switch QM between nominal / rescue

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Switch QM between nominal / rescue

Goto page 1, 2, 3 Next

Switch QM between nominal / rescue

« View previous topic :: View next topic »

Author

Message

Bad

Posted: Tue Jun 15, 2021 7:44 am Post subject: Switch QM between nominal / rescue

Novice

Joined: 15 Jun 2021
Posts: 14

Hello ,

I would like your opinion on a current problem:

Following an incident on a server (full fs) and incident resolved the switching time of a QM from a nominal server to its rescue and increased from 5 min to 1 hour. Do you have any idea of the parameters to check?

Thank you

hughson

Posted: Tue Jun 15, 2021 3:03 pm Post subject:

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

By rescue server, do you mean you failed over the queue manager?

You say something increased from 5 minutes to 1 hour. Could you tell us what that was?

You are asking for some parameters to check, but I don't think you have told us enough for us to understand the problem you are suffering yet. Until we know that, we are unlikely to be able to provide advice.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

Bad

Posted: Wed Jun 16, 2021 1:13 am Post subject:

Novice

Joined: 15 Jun 2021
Posts: 14

Hello

Sorry for the little explanation, and thank you for your feedback

I have 2 server

server 1 -> 2QM
QM A (running)
QM BS (running as standby)

server 2 -> 2QM
QM AS (running as standby)
QM B (running)

When I do my endmqm -s on the QMA of server 1 to switch to the QM AS its backup on server 2, this operation which following the incident now takes 1 hour. (before incident 5 min)

Hoping that this is a little clearer

cordially

hughson

Posted: Wed Jun 16, 2021 4:07 am Post subject:

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

Just to confirm I am understanding your issue. Am I correct that when you have a full file system, you are seeing it take 1 hour to end your queue manager and fail over to the standby machine?

You also say that before the incident it takes 5 minutes. I assume you mean that if there is no issue then endmqm -s takes 5 minutes.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

exerk

Posted: Wed Jun 16, 2021 12:23 pm Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

And another question - which file system was full?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

Bad

Posted: Wed Jun 16, 2021 12:25 pm Post subject:

Novice

Joined: 15 Jun 2021
Posts: 14

Hello

thanks for your return

We had an incident on the server 3 months ago (fs full) we solved it
Before this incident an endmqm -s took 5 min to switch the server QM

After the incident (no more fs full) everything is ok we see that the endmqm -s takes about 1 hour or more and I would like the endmqm -s to return to 5 min to switch.

cordially

exerk

Posted: Wed Jun 16, 2021 12:27 pm Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Bad wrote:

Hello

thanks for your return

Again, cordially, which file system was full? Is the queue manager in a queue manager cluster, because if not I'll move this thread to a more appropriate forum.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

Bad

Posted: Wed Jun 16, 2021 12:30 pm Post subject:

Novice

Joined: 15 Jun 2021
Posts: 14

exerk wrote:

And another question - which file system was full?

It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB

the cause was a partner who continued to send messages to a partner who closed his database for 3 days ...

hughson

Posted: Wed Jun 16, 2021 2:45 pm Post subject:

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

During the incident, did your queue manager end abnormally? I.e. did it have a lot of log data to replay upon next start up?

Does it still take 1 hour to fail over now?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

Bad

Posted: Thu Jun 17, 2021 6:18 am Post subject:

Novice

Joined: 15 Jun 2021
Posts: 14

thanks for your return

During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs and then restarted the QM with a strmqm -c QMA to rebuild it and then a strmqm -x QMA to restart it.

->"WebSphere MQ queue manager 'QMA' starting.
The queue manager is associated with installation 'Installation2'.
473 log records accessed on queue manager 'QMA' during the log replay phase.
Log replay for queue manager 'QMA4' complete.
Transaction manager state recovered for queue manager 'QMA'.
Creating or replacing default objects for queue manager 'QMA'"

"did it have a lot of log data to replay upon next start up?" -> If it's data is in the system cluster command queue it has never been replayed as we deleted and rebuilt it

"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now

exerk

Posted: Thu Jun 17, 2021 6:18 am Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Bad wrote:

exerk wrote:

And another question - which file system was full?

It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB

the cause was a partner who continued to send messages to a partner who closed his database for 3 days ...

What is your OS and MQ version please?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

exerk

Posted: Thu Jun 17, 2021 6:22 am Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Bad wrote:

...During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs...

Why, oh why, did you think this was a good idea? Why not expand the file system, or better still stop the queue manager first, then expand the file system?

And you have still to answer my question as to whether the queue manager is in a cluster of queue managers.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

Bad

Posted: Thu Jun 17, 2021 1:10 pm Post subject:

Novice

Joined: 15 Jun 2021
Posts: 14

exerk wrote:

Bad wrote:

I am only an operator, I only followed the instructions given by the expert. I do not have the level for this type of operation but I am trying to learn and understand considering the context and the failure caused, it may have been the fastest I don't know.

Yes the queue manager is in a cluster of queue manager , sorry i'm french and i'm not bilingual i may miss some sentences

gbaddeley

Posted: Thu Jun 17, 2021 3:22 pm Post subject:

Jedi

Joined: 25 Mar 2003
Posts: 2495
Location: Melbourne, Australia

Bad wrote:

I am only an operator, I only followed the instructions given by the expert.

Can you bring your MQ expert into this chat?

If the cluster command queue was using 3GB of disk space, there must have been an issue for a long time (days? weeks?), with cluster command processing not happening.

Full file systems are very bad for MQ.
The cluster command queue depth is normally zero.
Do you have monitoring and alerting on the file systems and on queue depths?
_________________
Glenn

bruce2359

Posted: Thu Jun 17, 2021 5:59 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9399
Location: US: west coast, almost. Otherwise, enroute.

gbaddeley wrote:

Full file systems are very bad for MQ.

If the active/standby see the same filesystem, the standby attempting to take ownership will likely encounter the same issue as the primary.

gbaddeley wrote:

The cluster command queue depth is normally zero.

... or decrementing toward zero as cluster commands are processed.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

Display posts from previous:

Goto page 1, 2, 3 Next

Page 1 of 3

MQSeries.net Forum Index » Clustering » Switch QM between nominal / rescue

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP