Author |
Message
|
rajesh00001 |
Posted: Fri Feb 28, 2020 10:28 am Post subject: DR status is Partitioned on Primary node in RDQM DR |
|
|
Apprentice
Joined: 08 Sep 2009 Posts: 34
|
Hi Team,
We have RDQM DR environment with asynchronous messaging .
We did a DR test. During the DR test we placed some message on queue and after fail-over to Backup Data center we lost those messages.
Followed the process as said in IBM from below page, but still the same issue.
https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.con.doc/q133050_.htm?view=embed
Queue manager is shutdown on both servers. Below is the status on both datacenters
On primary node DR Status is Partitioned
On Secondary node DR Status is Remote unavailable.
[mqm@10.0.0.1] /var/mqm/errors $ sudo rdqmstatus -m TEST
Queue manager status: Ended normally
Queue manager file system: 148MB used, 2.9GB allocated [5%]
DR role: Primary
DR status: Partitioned
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.1
DR remote IP address: 10.0.0.2
[mqm@10.0.0.2] /var/mqm/ $ sudo rdqmstatus -m TEST
Queue manager status: Ended immediately
DR role: Secondary
DR status: Remote unavailable
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.2
DR remote IP address: 10.0.0.1
DR out of sync data: 3145596KB
Can you please me to fix the problem. |
|
Back to top |
|
 |
exerk |
Posted: Fri Feb 28, 2020 12:12 pm Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
According to the KC, for there to be a Partitioned status:
1. Either both queue manager instances must have been running concurrently, and you promoted the secondary to be the primary, whilst the 'old' primary instance was still running;
Or:
2. The queue manager instances have been started on both nodes while the DR replication network is unavailable whilst the primary was running.
From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?
And lastly, irrespective of any of the above, were the test messages persistent or non-persistent? If the latter, what was the NPMCLASS of the queue(s) set to? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
rajesh00001 |
Posted: Fri Feb 28, 2020 2:16 pm Post subject: |
|
|
Apprentice
Joined: 08 Sep 2009 Posts: 34
|
Didn't started queue manager on both primary and secondary
Yes, message is persistence. queue is peristence.
NPMCLASS is normal |
|
Back to top |
|
 |
bruce2359 |
Posted: Fri Feb 28, 2020 3:04 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
rajesh00001 wrote: |
...message is persistence. |
How did you determine that the message was persistent? Did you view the message in the queue, and look at the message persistence field? Or, did you look at the queue attribute DEFPSIST? Or, something else?
rajesh00001 wrote: |
queue is peristence. |
MQ Queues are neither persistent nor non-persistent. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
rajesh00001 |
Posted: Fri Feb 28, 2020 3:39 pm Post subject: |
|
|
Apprentice
Joined: 08 Sep 2009 Posts: 34
|
opened message using mq explorer and verified the attribute.
Here i am looking how to fix "DR status: Partitioned" issue. |
|
Back to top |
|
 |
exerk |
Posted: Sat Feb 29, 2020 4:23 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
rajesh00001 wrote: |
Didn't started queue manager on both primary and secondary... |
Is that statement referring to before the partition state became apparent, or after the partition state became apparent?
rajesh00001 wrote: |
...Here i am looking how to fix "DR status: Partitioned" issue. |
I'll restate my original questions: "...From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?..." _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
rajesh00001 |
Posted: Sat Feb 29, 2020 9:13 pm Post subject: |
|
|
Apprentice
Joined: 08 Sep 2009 Posts: 34
|
exerk wrote: |
I'll restate my original questions: "...From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?..." |
Yes, servers are communicating each other. I have other queue manager on the same servers and those queue managers are working without issues.
exerk wrote: |
Is that statement referring to before the partition state became apparent, or after the partition state became apparent? |
After the partition state apparent. I don't when the status changed from "normal" to "partition" |
|
Back to top |
|
 |
exerk |
Posted: Sun Mar 01, 2020 12:13 pm Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
You are making it difficult to help you because you do not give full information with each post...
rajesh00001 wrote: |
Yes, servers are communicating each other. I have other queue manager on the same servers and those queue managers are working without issues. |
1. Are those queue managers also DR-RDQM queue managers? If so, are their counterpart instances on the same server as the other partitioned instance of the queue manager you are having issues with?
2. If queue managers on that server are DR-RDQM, do they each have their own dedicated network interface for replication? If so, have you tested that you can establish communication between the partitioned instances using that specific interface, or are you assuming that communication is OK because the others are OK?
rajesh00001 wrote: |
After the partition state apparent. I don't when the status changed from "normal" to "partition" |
This statement is meaningless! Did you, or did you not, at some time, have BOTH instances of the queue manager running on each server WITHOUT having demoted one and promoted the other? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
hughson |
Posted: Sun Mar 01, 2020 9:56 pm Post subject: Re: DR status is Partitioned on Primary node in RDQM DR |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
rajesh00001 wrote: |
[mqm@10.0.0.1] /var/mqm/errors $ sudo rdqmstatus -m TEST
Queue manager status: Ended normally
Queue manager file system: 148MB used, 2.9GB allocated [5%]
DR role: Primary
DR status: Partitioned
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.1
DR remote IP address: 10.0.0.2 |
I see that you have chosen to use Asynchronous rather than Synchronous replication. Are you aware of this:-
IBM Knowledge Center wrote: |
You can choose between synchronous and asynchronous replication of data between primary and secondary queue managers. If you select the asynchronous option, operations such as IBM MQ PUT or GET complete and return to the application before the event is replicated to the secondary queue manager. Asynchronous replication means that, following a recovery situation, some messaging data might be lost. But the secondary queue manager will be in a consistent state, and able to start running immediately, even if it is started at a slightly earlier part of the message stream. |
w.r.t. your partitioned state, what do the following logs show?
- Queue Manager error log
- crm status
- systemctl status pacemaker
- systemctl status corosync
- /var/log/messages (will show same as above)
- /var/log/pacemaker.log
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
rajesh00001 |
Posted: Thu Mar 12, 2020 9:55 am Post subject: |
|
|
Apprentice
Joined: 08 Sep 2009 Posts: 34
|
@Ererk,
Sorry for the late response.
I have 4 queue managers with RDQM DR setup.
WPC node1 having 4 queue managers and NPC node 1 having 4 queue managers .
Out of 4 queue managers 1 queue manager DR Status in Normal and 3 queue managers DR Status is partition.
@Hughson,
I will agree with IBM note.
But I tested with below senario
loaded queue with 10k messages. after 5 minutes did a DR fail-over. after the DR fail-over verified the queue and didn't see any messages on queue. lost all 10K messages.
[@wpcnode1] /var/log $ crm status
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected
wpcnode1] /var/log $ systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html
wpcnode1] /var/log $ systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:corosync
man:corosync.conf
man:corosync_overview
NPC:
[npcnode1] /var/mqm/ $ crm status
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected
[npcnode1] /var/mqm/ $ systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html
[npdnode1] /var/mqm/ $ systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:corosync
man:corosync.conf
man:corosync_overview |
|
Back to top |
|
 |
hughson |
Posted: Thu Mar 12, 2020 10:28 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Can you confirm that your RDQM system is correctly set up before you do the failover? It's rather odd to see that you have the same error
Code: |
crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected |
on both nodes.
What does rdqmstatus show on both nodes after you set them up, and put messages, but before you do the failover?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
|