Author |
Message
|
mgsantos |
Posted: Mon Feb 25, 2019 4:18 am Post subject: |
|
|
Newbie
Joined: 22 Feb 2019 Posts: 5
|
@hughson my test has been done simulating Qmgr processes dead unexpectedly. Following status from primary machine before and after killing it, as well as second one.
obs changed servername and qmgr name before posting this
Server: Primary | Status: Before killing processes
Code: |
$ dspmq -o dr -o status -m MQ
rdqmstatus -m MQ
QMNAME(MQ) STATUS(Running) DRROLE(Primary)
$ rdqmstatus -m MQ
Queue manager status: Running
CPU: 0.01%
Memory: 106MB
Queue manager file system: 3880MB used, 9.8GB allocated [39%]
DR role: Primary
DR status: Normal
DR type: Synchronous
DR port: 1482
DR local IP address: 10.201.64.36
DR remote IP address: 10.200.128.13
Command '/opt/mqm.91/bin/rdqmstatus' run with sudo.
$ /usr/sbin/drbdadm status
mq role:Primary
disk:UpToDate
mqserver2 role:Secondary
peer-disk:UpToDate |
Server: Primary | Status: After killing processes
Code: |
$ ps -ef | grep "/opt/mqm.91/bin/" | grep -v "grep" | awk '{print $2}'| xargs kill -9
$ ps -ef | grep amq
mqm 3212 117723 0 11:49 pts/3 00:00:00 grep --color=auto amq
$ dspmq -o dr -o status -m MQ
dqmstatus -m MQ
QMNAME(MQ) STATUS(Ended unexpectedly) DRROLE(Primary)
$ rdqmstatus -m MQ
Queue manager status: Ended unexpectedly
Queue manager file system: 3880MB used, 9.8GB allocated [39%]
DR role: Primary
DR status: Normal
DR type: Synchronous
DR port: 1482
DR local IP address: 10.201.64.36
DR remote IP address: 10.200.128.13
Command '/opt/mqm.91/bin/rdqmstatus' run with sudo.
$ /usr/sbin/drbdadm status
mq role:Primary
disk:UpToDate
mqserver2 role:Secondary
peer-disk:UpToDate |
Server: Secondary | Status: Before/After killing processes on primary (same values)
Code: |
$ dspmq -o dr -o status -m MQ
QMNAME(MQ) STATUS(Ended immediately) DRROLE(Secondary)
$ rdqmstatus -m MQ
Queue manager status: Ended immediately
DR role: Secondary
DR status: Normal
DR type: Synchronous
DR port: 1482
DR local IP address: 10.200.128.13
DR remote IP address: 10.201.64.36
Command '/opt/mqm.91/bin/rdqmstatus' run with sudo.
$ /usr/sbin/drbdadm status
mq role:Secondary
disk:UpToDate
mqserver1 role:Primary
peer-disk:UpToDate
$ rdqmdr -m MQ -p
AMQ3763E: Queue manager 'MQ' is already the DR primary on the remote node.
AMQ3769E: Failed to make queue manager 'MQ' the DR primary on this node.
Command '/opt/mqm.91/bin/rdqmdr' run with sudo. |
[/code] |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Feb 25, 2019 5:51 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Does that mean that in order for DR to become primary you HAVE to switch prod to secondary first? That would mean you didn't loose all control on prod... _________________ MQ & Broker admin |
|
Back to top |
|
 |
exerk |
Posted: Mon Feb 25, 2019 6:03 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
The KC clearly states "...Following the loss of the primary queue manager at the main site, you make the secondary queue manager at the recovery site into the primary and start it...".
There is nothing I can see in the KC that implies, or states, that take-over is automatic, so are you killing the primary, ensuring all replication processes are also 'dead', and setting the secondary to primary? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
mgsantos |
Posted: Mon Feb 25, 2019 7:04 am Post subject: |
|
|
Newbie
Joined: 22 Feb 2019 Posts: 5
|
@exerc the exercise is simulating the queue manager is lost for whatever reason but the operating system is up and running fine, I did not kill replication processes, I see [drbd*] processes running, are you talking about those?
my next plan is to do testing with server shutdown (once i can get someone to do that for me) and see if I can change secondary queue manager into primary. |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Feb 25, 2019 7:32 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
I guess he left the replication processes up and that's why he could not switch the secondary to primary...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
exerk |
Posted: Mon Feb 25, 2019 8:53 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
mgsantos wrote: |
...I did not kill replication processes, I see [drbd*] processes running... |
With that running, you will not be able to...
mgsantos wrote: |
...see if I can change secondary queue manager into primary... |
Please reread the last paragraph of my previous post. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
mgsantos |
Posted: Mon Feb 25, 2019 11:29 am Post subject: |
|
|
Newbie
Joined: 22 Feb 2019 Posts: 5
|
I am covering the following test scenarios:
1. Planned outage: stop qmgr on primary, switch it to secondary, switch the secondary to primary, start it
2. Unplanned outage:
a. MQ server is broken for what ever reason, but server and operating system are up.
b. Server and OS is down.
I have done 1, and testing 2, item a. I know takeover is not automatic, also I understand that the problem are the replication processes now, drbd* ones, however I have no idea how to stop them manually, any thoughts ? probably something with command: drbdadm |
|
Back to top |
|
 |
mgsantos |
Posted: Mon Feb 25, 2019 11:37 am Post subject: |
|
|
Newbie
Joined: 22 Feb 2019 Posts: 5
|
self replying
well I guess from mq admin perspective I just need to run rdqmdr commands even if qmgr is down, no need to know anything about drbdadm stuff... I will validate if with server being down I can do the switch on the second box.
thanks for the help so far. |
|
Back to top |
|
 |
hughson |
Posted: Tue Feb 26, 2019 1:05 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
I would suggest that if you are arbitrarily killing processes in the hope of simulating a queue manager failure, that you simply have not killed the correct ones. If you are testing server failure, why not just take out the server, instead of only certain processes? That would be a more valid test in my view.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Feb 26, 2019 4:50 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9471 Location: US: west coast, almost. Otherwise, enroute.
|
hughson wrote: |
I would suggest that if you are arbitrarily killing processes in the hope of simulating a queue manager failure, that you simply have not killed the correct ones. If you are testing server failure, why not just take out the server, instead of only certain processes? That would be a more valid test in my view.
Cheers,
Morag |
Do you do a similar 'kill a random o/s process' to further test your DR strategy?
Much simpler test: reach around the back of the server and pull the power cord. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
JosephGramig |
Posted: Wed Nov 13, 2019 12:07 pm Post subject: |
|
|
 Grand Master
Joined: 09 Feb 2006 Posts: 1244 Location: Gold Coast of Florida, USA
|
You should consider configuring your Qmgrs as a RHEL service |
|
Back to top |
|
 |
|