ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Switch QM between nominal / rescue

Post new topic  Reply to topic Goto page 1, 2, 3  Next
 Switch QM between nominal / rescue « View previous topic :: View next topic » 
Author Message
Bad
PostPosted: Tue Jun 15, 2021 7:44 am    Post subject: Switch QM between nominal / rescue Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

Hello ,

I would like your opinion on a current problem:

Following an incident on a server (full fs) and incident resolved the switching time of a QM from a nominal server to its rescue and increased from 5 min to 1 hour. Do you have any idea of ​​the parameters to check?

Thank you
Back to top
View user's profile Send private message
hughson
PostPosted: Tue Jun 15, 2021 3:03 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

By rescue server, do you mean you failed over the queue manager?

You say something increased from 5 minutes to 1 hour. Could you tell us what that was?

You are asking for some parameters to check, but I don't think you have told us enough for us to understand the problem you are suffering yet. Until we know that, we are unlikely to be able to provide advice.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Bad
PostPosted: Wed Jun 16, 2021 1:13 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

Hello

Sorry for the little explanation, and thank you for your feedback

I have 2 server

server 1 -> 2QM
QM A (running)
QM BS (running as standby)

server 2 -> 2QM
QM AS (running as standby)
QM B (running)

When I do my endmqm -s on the QMA of server 1 to switch to the QM AS its backup on server 2, this operation which following the incident now takes 1 hour. (before incident 5 min)

Hoping that this is a little clearer

cordially
Back to top
View user's profile Send private message
hughson
PostPosted: Wed Jun 16, 2021 4:07 am    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

Just to confirm I am understanding your issue. Am I correct that when you have a full file system, you are seeing it take 1 hour to end your queue manager and fail over to the standby machine?

You also say that before the incident it takes 5 minutes. I assume you mean that if there is no issue then endmqm -s takes 5 minutes.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
exerk
PostPosted: Wed Jun 16, 2021 12:23 pm    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

And another question - which file system was full?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
Bad
PostPosted: Wed Jun 16, 2021 12:25 pm    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

Hello

thanks for your return


We had an incident on the server 3 months ago (fs full) we solved it
Before this incident an endmqm -s took 5 min to switch the server QM

After the incident (no more fs full) everything is ok we see that the endmqm -s takes about 1 hour or more and I would like the endmqm -s to return to 5 min to switch.

cordially
Back to top
View user's profile Send private message
exerk
PostPosted: Wed Jun 16, 2021 12:27 pm    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Bad wrote:
Hello

thanks for your return


We had an incident on the server 3 months ago (fs full) we solved it
Before this incident an endmqm -s took 5 min to switch the server QM

After the incident (no more fs full) everything is ok we see that the endmqm -s takes about 1 hour or more and I would like the endmqm -s to return to 5 min to switch.

cordially

Again, cordially, which file system was full? Is the queue manager in a queue manager cluster, because if not I'll move this thread to a more appropriate forum.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
Bad
PostPosted: Wed Jun 16, 2021 12:30 pm    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

exerk wrote:
And another question - which file system was full?

It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB

the cause was a partner who continued to send messages to a partner who closed his database for 3 days ...
Back to top
View user's profile Send private message
hughson
PostPosted: Wed Jun 16, 2021 2:45 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

During the incident, did your queue manager end abnormally? I.e. did it have a lot of log data to replay upon next start up?

Does it still take 1 hour to fail over now?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Bad
PostPosted: Thu Jun 17, 2021 6:18 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

thanks for your return

During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs and then restarted the QM with a strmqm -c QMA to rebuild it and then a strmqm -x QMA to restart it.


->"WebSphere MQ queue manager 'QMA' starting.
The queue manager is associated with installation 'Installation2'.
473 log records accessed on queue manager 'QMA' during the log replay phase.
Log replay for queue manager 'QMA4' complete.
Transaction manager state recovered for queue manager 'QMA'.
Creating or replacing default objects for queue manager 'QMA'"




"did it have a lot of log data to replay upon next start up?" -> If it's data is in the system cluster command queue it has never been replayed as we deleted and rebuilt it


"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now
Back to top
View user's profile Send private message
exerk
PostPosted: Thu Jun 17, 2021 6:18 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Bad wrote:
exerk wrote:
And another question - which file system was full?

It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB

the cause was a partner who continued to send messages to a partner who closed his database for 3 days ...

What is your OS and MQ version please?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
exerk
PostPosted: Thu Jun 17, 2021 6:22 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Bad wrote:
...During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs...

Why, oh why, did you think this was a good idea? Why not expand the file system, or better still stop the queue manager first, then expand the file system?

And you have still to answer my question as to whether the queue manager is in a cluster of queue managers.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
Bad
PostPosted: Thu Jun 17, 2021 1:10 pm    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

exerk wrote:
Bad wrote:
...During the incident the manager we indeed had an error "AMQ9448: Repository manager failed due of errors. Retry in 10 minutes, queue
manager will terminate in 7190 minutes " to be able to solve it we deleted the "delete qlocal (SYSTEM.CLUSTER.COMMAND.QUEUE)"because it saturated the fs...

Why, oh why, did you think this was a good idea? Why not expand the file system, or better still stop the queue manager first, then expand the file system?

And you have still to answer my question as to whether the queue manager is in a cluster of queue managers.


I am only an operator, I only followed the instructions given by the expert. I do not have the level for this type of operation but I am trying to learn and understand considering the context and the failure caused, it may have been the fastest I don't know.

Yes the queue manager is in a cluster of queue manager , sorry i'm french and i'm not bilingual i may miss some sentences
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Thu Jun 17, 2021 3:22 pm    Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2491
Location: Melbourne, Australia

Bad wrote:
I am only an operator, I only followed the instructions given by the expert.

Can you bring your MQ expert into this chat?

If the cluster command queue was using 3GB of disk space, there must have been an issue for a long time (days? weeks?), with cluster command processing not happening.

Full file systems are very bad for MQ.
The cluster command queue depth is normally zero.
Do you have monitoring and alerting on the file systems and on queue depths?
_________________
Glenn
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Jun 17, 2021 5:59 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9392
Location: US: west coast, almost. Otherwise, enroute.

gbaddeley wrote:
Full file systems are very bad for MQ.

If the active/standby see the same filesystem, the standby attempting to take ownership will likely encounter the same issue as the primary.
gbaddeley wrote:
The cluster command queue depth is normally zero.

... or decrementing toward zero as cluster commands are processed.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2, 3  Next Page 1 of 3

MQSeries.net Forum Index » Clustering » Switch QM between nominal / rescue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.