MQSeries.net :: View topic - MQ 9.1 RDQM DR failover query

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » MQ 9.1 RDQM DR failover query

Goto page Previous 1, 2

MQ 9.1 RDQM DR failover query

« View previous topic :: View next topic »

Author

Message

mgsantos

Posted: Mon Feb 25, 2019 4:18 am Post subject:

Newbie

Joined: 22 Feb 2019
Posts: 5

@hughson my test has been done simulating Qmgr processes dead unexpectedly. Following status from primary machine before and after killing it, as well as second one.

obs changed servername and qmgr name before posting this
Server: Primary | Status: Before killing processes

Code:

$ dspmq -o dr -o status -m MQ
rdqmstatus -m MQ
QMNAME(MQ) STATUS(Running) DRROLE(Primary)
$ rdqmstatus -m MQ
Queue manager status: Running
CPU: 0.01%
Memory: 106MB
Queue manager file system: 3880MB used, 9.8GB allocated [39%]
DR role: Primary
DR status: Normal
DR type: Synchronous
DR port: 1482
DR local IP address: 10.201.64.36
DR remote IP address: 10.200.128.13
Command '/opt/mqm.91/bin/rdqmstatus' run with sudo.
$ /usr/sbin/drbdadm status
mq role:Primary
disk:UpToDate
mqserver2 role:Secondary
peer-disk:UpToDate

Server: Primary | Status: After killing processes

Code:

$ ps -ef | grep "/opt/mqm.91/bin/" | grep -v "grep" | awk '{print $2}'| xargs kill -9
$ ps -ef | grep amq
mqm 3212 117723 0 11:49 pts/3 00:00:00 grep --color=auto amq
$ dspmq -o dr -o status -m MQ
dqmstatus -m MQ
QMNAME(MQ) STATUS(Ended unexpectedly) DRROLE(Primary)
$ rdqmstatus -m MQ
Queue manager status: Ended unexpectedly
Queue manager file system: 3880MB used, 9.8GB allocated [39%]
DR role: Primary
DR status: Normal
DR type: Synchronous
DR port: 1482
DR local IP address: 10.201.64.36
DR remote IP address: 10.200.128.13
Command '/opt/mqm.91/bin/rdqmstatus' run with sudo.
$ /usr/sbin/drbdadm status
mq role:Primary
disk:UpToDate
mqserver2 role:Secondary
peer-disk:UpToDate

Server: Secondary | Status: Before/After killing processes on primary (same values)

Code:

$ dspmq -o dr -o status -m MQ
QMNAME(MQ) STATUS(Ended immediately) DRROLE(Secondary)
$ rdqmstatus -m MQ
Queue manager status: Ended immediately
DR role: Secondary
DR status: Normal
DR type: Synchronous
DR port: 1482
DR local IP address: 10.200.128.13
DR remote IP address: 10.201.64.36
Command '/opt/mqm.91/bin/rdqmstatus' run with sudo.
$ /usr/sbin/drbdadm status
mq role:Secondary
disk:UpToDate
mqserver1 role:Primary
peer-disk:UpToDate

$ rdqmdr -m MQ -p
AMQ3763E: Queue manager 'MQ' is already the DR primary on the remote node.
AMQ3769E: Failed to make queue manager 'MQ' the DR primary on this node.
Command '/opt/mqm.91/bin/rdqmdr' run with sudo.

[/code]

fjb_saper

Posted: Mon Feb 25, 2019 5:51 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20772
Location: LI,NY

Does that mean that in order for DR to become primary you HAVE to switch prod to secondary first? That would mean you didn't loose all control on prod...

_________________
MQ & Broker admin

exerk

Posted: Mon Feb 25, 2019 6:03 am Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

The KC clearly states "...Following the loss of the primary queue manager at the main site, you make the secondary queue manager at the recovery site into the primary and start it...".

There is nothing I can see in the KC that implies, or states, that take-over is automatic, so are you killing the primary, ensuring all replication processes are also 'dead', and setting the secondary to primary?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

mgsantos

Posted: Mon Feb 25, 2019 7:04 am Post subject:

Newbie

Joined: 22 Feb 2019
Posts: 5

@exerc the exercise is simulating the queue manager is lost for whatever reason but the operating system is up and running fine, I did not kill replication processes, I see [drbd*] processes running, are you talking about those?

my next plan is to do testing with server shutdown (once i can get someone to do that for me) and see if I can change secondary queue manager into primary.

fjb_saper

Posted: Mon Feb 25, 2019 7:32 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20772
Location: LI,NY

I guess he left the replication processes up and that's why he could not switch the secondary to primary...

_________________
MQ & Broker admin

exerk

Posted: Mon Feb 25, 2019 8:53 am Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

mgsantos wrote:

...I did not kill replication processes, I see [drbd*] processes running...

With that running, you will not be able to...

mgsantos wrote:

...see if I can change secondary queue manager into primary...

Please reread the last paragraph of my previous post.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

mgsantos

Posted: Mon Feb 25, 2019 11:29 am Post subject:

Newbie

Joined: 22 Feb 2019
Posts: 5

I am covering the following test scenarios:
1. Planned outage: stop qmgr on primary, switch it to secondary, switch the secondary to primary, start it
2. Unplanned outage:
a. MQ server is broken for what ever reason, but server and operating system are up.
b. Server and OS is down.

I have done 1, and testing 2, item a. I know takeover is not automatic, also I understand that the problem are the replication processes now, drbd* ones, however I have no idea how to stop them manually, any thoughts ? probably something with command: drbdadm

mgsantos

Posted: Mon Feb 25, 2019 11:37 am Post subject:

Newbie

Joined: 22 Feb 2019
Posts: 5

self replying

well I guess from mq admin perspective I just need to run rdqmdr commands even if qmgr is down, no need to know anything about drbdadm stuff... I will validate if with server being down I can do the switch on the second box.

thanks for the help so far.

hughson

Posted: Tue Feb 26, 2019 1:05 am Post subject:

Padawan

Joined: 09 May 2013
Posts: 1977
Location: Bay of Plenty, New Zealand

I would suggest that if you are arbitrarily killing processes in the hope of simulating a queue manager failure, that you simply have not killed the correct ones. If you are testing server failure, why not just take out the server, instead of only certain processes? That would be a more valid test in my view.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

bruce2359

Posted: Tue Feb 26, 2019 4:50 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9489
Location: US: west coast, almost. Otherwise, enroute.

hughson wrote:

Do you do a similar 'kill a random o/s process' to further test your DR strategy?

Much simpler test: reach around the back of the server and pull the power cord.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

JosephGramig

Posted: Wed Nov 13, 2019 12:07 pm Post subject:

Grand Master

Joined: 09 Feb 2006
Posts: 1247
Location: Gold Coast of Florida, USA

You should consider configuring your Qmgrs as a RHEL service

Display posts from previous:

Goto page Previous 1, 2

Page 2 of 2

MQSeries.net Forum Index » General IBM MQ Support » MQ 9.1 RDQM DR failover query

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP