Author |
Message
|
mqdev |
Posted: Tue Jun 04, 2019 12:42 pm Post subject: RDQM - floating IP - how to? |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Hello,
I have 3 nodes in my drbd cluster with IPs as follows:
Node1 - inet 172.24.102.138 netmask 255.255.252.0
Node2 - inet 172.24.101.29 netmask 255.255.252.0
Node3 - inet 172.18.178.195 netmask 255.255.252.0
Based on above netmask and IPs, the Floating IP for an RDQM in above cluster needs to be in the following range:
For node1 and node2 : 172.24.100.1 (low) -- 172.24.103.254 (high)
For node3 : 172.18.176.1 (low) -- 172.18.179.254 (high).
There is no overlap in the above IP address ranges - does it mean I cannot assign a Floating IP to an RDQM implemented on above drbd cluster?
Here is what I tried:
on node1 ( IP = 172.24.102.138), I tired assigning Floating IP 172.18.179.200 which failed ( Floating IP address '172.18.179.200' not in interface 'eth0' subnet. )
Next tried assigning 172.24.103.200, it worked ( coz 172.24.102.138 and 172.24.103.200 are in same subnet). However, the RDQM is now dysfunctional - it cannot failover to the 3rd node ( 172.18.176.1). In fact, it is not failing over at all...
Questions:
Should the drbd Cluster IPs be in a certain way (so that the Floating IP matches the IP and Netmask for all 3 nodes)?
I had success with Assigning floating IP ( node IPs - 172.24.101.27/172.24.103.199/172.24.103.79, Floating IP - '172.24.103.1' ) when all nodes were in same subnet and/or have overlapping IP ranges (as per ip & netmask)....
However, if the drbd nodes are not in same subnet, looks like Floating IP is a moot point...thoughts? |
|
Back to top |
|
 |
hughson |
Posted: Tue Jun 04, 2019 9:55 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
In order to use a floating IP address with an RDQM queue manager, all nodes must be in the same subnet.
In addition, all nodes must have the same name for the network interface, in your example 'eth0'.
If you don't intend to use a floating IP address, then it is OK for the nodes of an RDQM queue manager to be in different subnets.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
mqdev |
Posted: Wed Jun 05, 2019 6:14 am Post subject: Thanks for the prompt and clear response - much appreciated! |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Thanks for the prompt and clear response - much appreciated! |
|
Back to top |
|
 |
mqdev |
Posted: Wed Jun 05, 2019 6:22 am Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Morag - can you please give some troubleshooting steps for RDQM?
I did assign an Floating IP (172.24.103.200) to the QM which was Primary on 172.24.102.138 - the rdqmint command itself completed successfully (no errors thrown). However, the QM is hosed from that point onwards -it wouldnt failover (tried failing over to the node 172.18.178.195.and it did not work). However, it wouldnt failover to the other node - 172.24.101.29 as well.
The failover command ( rdqmadm -p -m <RDQM Name> -n <node name> ) completes successfully (i.e. no errors thrown - gives a msg that given node is set as Primary for the QM). However, the failover NEVER happens (I have noticed, in general, it takes a few seconds for the failover to occur - but in this case, it just doesnt happen!). Nothing suspicious in /var/log/messages as well..how can I troubleshoot this further to understand why the failover is not happening?
Thanks in advance for your time!
-mqdev |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jun 05, 2019 7:46 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
As Morag said:
Do not assign a floating ip if
- The network connection on all 3 servers are not assigned to the same interface (ex eth0)
- all 3 servers are not in the same subnet (look at the boson subnet calculator for help) even if it is a vlan...
Hope this helps
 _________________ MQ & Broker admin |
|
Back to top |
|
 |
mqdev |
Posted: Wed Jun 05, 2019 10:28 am Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
I am looking for toubleshooting steps...as my RDQM is currently hosed due to trying to add the Floating IP...would like to resurrect it, if possible.
Going forward, yes - we will ensure the drbd nodes are in same subnet for us to be able to use Floating IP.
hope that helps why I am asking the question... |
|
Back to top |
|
 |
hughson |
Posted: Wed Jun 05, 2019 7:47 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Have you removed the floating IP address already?
Code: |
rdqmint -m <RDQM Name> -d |
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
mqdev |
Posted: Thu Jun 06, 2019 5:04 am Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
yes (remove the Floating IP from the RDQM) - still no joy.
the RDQM stays hosed...the RDQM itself is no concern (this is Dev env). The bigger payoff for us is to learn how to troubleshoot this situation. Any information in this direction would be highly useful... |
|
Back to top |
|
 |
john_colgrave |
Posted: Fri Jun 07, 2019 6:01 am Post subject: Troubleshooting RDQM |
|
|
Newbie
Joined: 02 Jun 2014 Posts: 7
|
If you suspect a problem with any of the resources managed by Pacemaker, and for RDQM failover is managed by Pacemaker, the first thing to do is to run "crm status" and study the output. If you post that output here we can take it from there. |
|
Back to top |
|
 |
mqdev |
Posted: Mon Jun 10, 2019 12:35 pm Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
John/Morag: Please see below, the output of "crm status" command (I have masked the domain name as "bbbbbbbb.com" in the output below)
RDQM1 is the QM where I attempted to attach the Floating IP and backed out. I am now not able to "move" around RDQM1 using the command
rdqmadm -p -m RDQM1 -n lnc3234.bbbbbbbb.com command...
This command completes without errors but, the RDQM1 is not failing over...
==========================================
root@lnc3234 ~
# crm status
Stack: corosync
Current DC: lncb90c.bbbbbbbbb.com (version 1.1.15.linbit-2.0+20160622+e174ec8.el7-e174ec8) - partition with quorum
Last updated: Mon Jun 10 16:29:27 2019 Last change: Mon Jun 10 15:56:15 2019 by root via crm_attribute on lnc3234.bbbbbbbbb.com
3 nodes and 18 resources configured
Online: [ lnc3234.bbbbbbbbb.com lnc3235.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
Full list of resources:
Master/Slave Set: ms_drbd_rdqm1 [p_drbd_rdqm1]
Masters: [ lncb90c.bbbbbbbbb.com ]
Slaves: [ lnc3234.bbbbbbbbb.com lnc3235.bbbbbbbbb.com ]
p_fs_rdqm1 (ocf::heartbeat:Filesystem): Started lncb90c.bbbbbbbbb.com
p_rdqmx_rdqm1 (ocf::ibm:rdqmx): Started lncb90c.bbbbbbbbb.com
rdqm1 (ocf::ibm:rdqm): Started lncb90c.bbbbbbbbb.com
Master/Slave Set: ms_drbd_rdqm2 [p_drbd_rdqm2]
Masters: [ lnc3235.bbbbbbbbb.com ]
Slaves: [ lnc3234.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
p_fs_rdqm2 (ocf::heartbeat:Filesystem): Started lnc3235.bbbbbbbbb.com
p_rdqmx_rdqm2 (ocf::ibm:rdqmx): Started lnc3235.bbbbbbbbb.com
rdqm2 (ocf::ibm:rdqm): Started lnc3235.bbbbbbbbb.com
Master/Slave Set: ms_drbd_qm0_ad_us_lnc3234 [p_drbd_qm0_ad_us_lnc3234]
Masters: [ lnc3234.bbbbbbbbb.com ]
Slaves: [ lnc3235.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
p_fs_qm0_ad_us_lnc3234 (ocf::heartbeat:Filesystem): Started lnc3234.bbbbbbbbb.com
p_rdqmx_qm0_ad_us_lnc3234 (ocf::ibm:rdqmx): Started lnc3234.bbbbbbbbb.com
qm0_ad_us_lnc3234 (ocf::ibm:rdqm): Started lnc3234.bbbbbbbbb.com
Failed Actions:
* p_drbd_rdqm1_monitor_20000 on lnc3234.bbbbbbbbb.com 'master' (8): call=59, status=complete, exitreason='none',
last-rc-change='Tue Jun 4 16:07:27 2019', queued=0ms, exec=0ms
root@lnc3234 ~
#
==============================================================================
root@lnc3235 ~
# crm status
Stack: corosync
Current DC: lncb90c.bbbbbbbbb.com (version 1.1.15.linbit-2.0+20160622+e174ec8.el7-e174ec8) - partition with quorum
Last updated: Mon Jun 10 16:28:55 2019 Last change: Mon Jun 10 15:56:15 2019 by root via crm_attribute on lnc3234.bbbbbbbbb.com
3 nodes and 18 resources configured
Online: [ lnc3234.bbbbbbbbb.com lnc3235.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
Full list of resources:
Master/Slave Set: ms_drbd_rdqm1 [p_drbd_rdqm1]
Masters: [ lncb90c.bbbbbbbbb.com ]
Slaves: [ lnc3234.bbbbbbbbb.com lnc3235.bbbbbbbbb.com ]
p_fs_rdqm1 (ocf::heartbeat:Filesystem): Started lncb90c.bbbbbbbbb.com
p_rdqmx_rdqm1 (ocf::ibm:rdqmx): Started lncb90c.bbbbbbbbb.com
rdqm1 (ocf::ibm:rdqm): Started lncb90c.bbbbbbbbb.com
Master/Slave Set: ms_drbd_rdqm2 [p_drbd_rdqm2]
Masters: [ lnc3235.bbbbbbbbb.com ]
Slaves: [ lnc3234.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
p_fs_rdqm2 (ocf::heartbeat:Filesystem): Started lnc3235.bbbbbbbbb.com
p_rdqmx_rdqm2 (ocf::ibm:rdqmx): Started lnc3235.bbbbbbbbb.com
rdqm2 (ocf::ibm:rdqm): Started lnc3235.bbbbbbbbb.com
Master/Slave Set: ms_drbd_qm0_ad_us_lnc3234 [p_drbd_qm0_ad_us_lnc3234]
Masters: [ lnc3234.bbbbbbbbb.com ]
Slaves: [ lnc3235.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
p_fs_qm0_ad_us_lnc3234 (ocf::heartbeat:Filesystem): Started lnc3234.bbbbbbbbb.com
p_rdqmx_qm0_ad_us_lnc3234 (ocf::ibm:rdqmx): Started lnc3234.bbbbbbbbb.com
qm0_ad_us_lnc3234 (ocf::ibm:rdqm): Started lnc3234.bbbbbbbbb.com
Failed Actions:
* p_drbd_rdqm1_monitor_20000 on lnc3234.bbbbbbbbb.com 'master' (8): call=59, status=complete, exitreason='none',
last-rc-change='Tue Jun 4 16:07:27 2019', queued=0ms, exec=0ms
root@lnc3235 ~
==============================================================================
[root@lncb90c ~]# crm status
Stack: corosync
Current DC: lncb90c.bbbbbbbbb.com (version 1.1.15.linbit-2.0+20160622+e174ec8.el7-e174ec8) - partition with quorum
Last updated: Mon Jun 10 16:28:34 2019 Last change: Mon Jun 10 15:56:15 2019 by root via crm_attribute on lnc3234.bbbbbbbbb.com
3 nodes and 18 resources configured
Online: [ lnc3234.bbbbbbbbb.com lnc3235.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
Full list of resources:
Master/Slave Set: ms_drbd_rdqm1 [p_drbd_rdqm1]
Masters: [ lncb90c.bbbbbbbbb.com ]
Slaves: [ lnc3234.bbbbbbbbb.com lnc3235.bbbbbbbbb.com ]
p_fs_rdqm1 (ocf::heartbeat:Filesystem): Started lncb90c.bbbbbbbbb.com
p_rdqmx_rdqm1 (ocf::ibm:rdqmx): Started lncb90c.bbbbbbbbb.com
rdqm1 (ocf::ibm:rdqm): Started lncb90c.bbbbbbbbb.com
Master/Slave Set: ms_drbd_rdqm2 [p_drbd_rdqm2]
Masters: [ lnc3235.bbbbbbbbb.com ]
Slaves: [ lnc3234.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
p_fs_rdqm2 (ocf::heartbeat:Filesystem): Started lnc3235.bbbbbbbbb.com
p_rdqmx_rdqm2 (ocf::ibm:rdqmx): Started lnc3235.bbbbbbbbb.com
rdqm2 (ocf::ibm:rdqm): Started lnc3235.bbbbbbbbb.com
Master/Slave Set: ms_drbd_qm0_ad_us_lnc3234 [p_drbd_qm0_ad_us_lnc3234]
Masters: [ lnc3234.bbbbbbbbb.com ]
Slaves: [ lnc3235.bbbbbbbbb.com lncb90c.bbbbbbbbb.com ]
p_fs_qm0_ad_us_lnc3234 (ocf::heartbeat:Filesystem): Started lnc3234.bbbbbbbbb.com
p_rdqmx_qm0_ad_us_lnc3234 (ocf::ibm:rdqmx): Started lnc3234.bbbbbbbbb.com
qm0_ad_us_lnc3234 (ocf::ibm:rdqm): Started lnc3234.bbbbbbbbb.com
Failed Actions:
* p_drbd_rdqm1_monitor_20000 on lnc3234.bbbbbbbbb.com 'master' (8): call=59, status=complete, exitreason='none',
last-rc-change='Tue Jun 4 16:07:27 2019', queued=0ms, exec=0ms
[root@lncb90c ~]#
=========================================================
Last edited by mqdev on Mon Jun 10, 2019 12:44 pm; edited 1 time in total |
|
Back to top |
|
 |
mqdev |
Posted: Mon Jun 10, 2019 12:40 pm Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
in the above is actually "( 8 ) :" without spaces....
Also, we are at MQ v9.1.2.0 on these nodes.
Last edited by mqdev on Mon Jun 10, 2019 12:44 pm; edited 1 time in total |
|
Back to top |
|
 |
hughson |
Posted: Mon Jun 10, 2019 12:42 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
mqdev wrote: |
in the above is actually "( 8 ) :" without spaces.... |
Edit your post and check "Disable Smilies in this post" which is just below the edit box. _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
mqdev |
Posted: Mon Jun 10, 2019 12:45 pm Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
hughson wrote: |
mqdev wrote: |
in the above is actually "( 8 ) :" without spaces.... |
Edit your post and check "Disable Smilies in this post" which is just below the edit box. |
Done..thank you. |
|
Back to top |
|
 |
mqdev |
Posted: Mon Jun 10, 2019 12:51 pm Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
@John, @Morag,
The "crm status" command does indicate a failure with RDQM1 (which is the problematic RDQM). How can I findout exactly what the problem is?
The "last-rc-change='Tue Jun 4 16:07:27 2019'" - this is the time I attempted to add the Floating IP. Somehow my action hosed the Pacemaker, it appears.
Secondly, why is the rdqmadm ending normally when failover is not being achieved? The reason this is important is - we are scripting automation around these commands - so if a given command runs successfully but in reality fails to achieve the intended result, our monitoring will go haywire....
root@lnc3235 ~
# dspmq
QMNAME(RDQM1) STATUS(Running elsewhere)
QMNAME(RDQM2) STATUS(Running)
QMNAME(QM0.AD.US.LNC3234) STATUS(Running elsewhere)
root@lnc3235 ~
root@lnc3235 ~
# rdqmadm -p -m RDQM1 -n lnc3235.bbbbbbbbb.com
The preferred replicated data node has been set to 'lnc3235.bbbbbbbbb.com' for
queue manager 'RDQM1'.
root@lnc3235 ~
# echo $?
0
root@lnc3235 ~
# |
|
Back to top |
|
 |
hughson |
Posted: Mon Jun 10, 2019 1:22 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
mqdev wrote: |
why is the rdqmadm ending normally when failover is not being achieved? |
I'll let John, as the RQDM Architect in IBM Hursley answer the crm status output question. I just wanted to add something about your rdqmadm command.
The rdqmadm command you are issuing is to set a preferred node for the queue manager. The fact that this can cause the queue manager to move to that node is asynchronous to the command. I suspect if you display the RDQM (rdqmstatus) you will see that the preferred node has been successfully set. This will be why the command completed successfully.
In other words, the command is not "move my Qmgr", the command "set the preference" - the queue manager will move to the preferred node ... if it can. Does that make sense?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
|