MQSeries.net :: View topic - DR status is Partitioned on Primary node in RDQM DR

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » DR status is Partitioned on Primary node in RDQM DR

DR status is Partitioned on Primary node in RDQM DR

« View previous topic :: View next topic »

Author

Message

rajesh00001

Posted: Fri Feb 28, 2020 10:28 am Post subject: DR status is Partitioned on Primary node in RDQM DR

Apprentice

Joined: 08 Sep 2009
Posts: 35

Hi Team,

We have RDQM DR environment with asynchronous messaging .

We did a DR test. During the DR test we placed some message on queue and after fail-over to Backup Data center we lost those messages.

Followed the process as said in IBM from below page, but still the same issue.
https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.con.doc/q133050_.htm?view=embed

Queue manager is shutdown on both servers. Below is the status on both datacenters

On primary node DR Status is Partitioned
On Secondary node DR Status is Remote unavailable.

[mqm@10.0.0.1] /var/mqm/errors $ sudo rdqmstatus -m TEST
Queue manager status: Ended normally
Queue manager file system: 148MB used, 2.9GB allocated [5%]
DR role: Primary
DR status: Partitioned
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.1
DR remote IP address: 10.0.0.2

[mqm@10.0.0.2] /var/mqm/ $ sudo rdqmstatus -m TEST
Queue manager status: Ended immediately
DR role: Secondary
DR status: Remote unavailable
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.2
DR remote IP address: 10.0.0.1
DR out of sync data: 3145596KB

Can you please me to fix the problem.

exerk

Posted: Fri Feb 28, 2020 12:12 pm Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

According to the KC, for there to be a Partitioned status:

1. Either both queue manager instances must have been running concurrently, and you promoted the secondary to be the primary, whilst the 'old' primary instance was still running;

Or:

2. The queue manager instances have been started on both nodes while the DR replication network is unavailable whilst the primary was running.

From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?

And lastly, irrespective of any of the above, were the test messages persistent or non-persistent? If the latter, what was the NPMCLASS of the queue(s) set to?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

rajesh00001

Posted: Fri Feb 28, 2020 2:16 pm Post subject:

Apprentice

Joined: 08 Sep 2009
Posts: 35

Didn't started queue manager on both primary and secondary

Yes, message is persistence. queue is peristence.

NPMCLASS is normal

bruce2359

Posted: Fri Feb 28, 2020 3:04 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9486
Location: US: west coast, almost. Otherwise, enroute.

rajesh00001 wrote:

...message is persistence.

How did you determine that the message was persistent? Did you view the message in the queue, and look at the message persistence field? Or, did you look at the queue attribute DEFPSIST? Or, something else?

rajesh00001 wrote:

queue is peristence.

MQ Queues are neither persistent nor non-persistent.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

rajesh00001

Posted: Fri Feb 28, 2020 3:39 pm Post subject:

Apprentice

Joined: 08 Sep 2009
Posts: 35

opened message using mq explorer and verified the attribute.

Here i am looking how to fix "DR status: Partitioned" issue.

exerk

Posted: Sat Feb 29, 2020 4:23 am Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

rajesh00001 wrote:

Didn't started queue manager on both primary and secondary...

Is that statement referring to before the partition state became apparent, or after the partition state became apparent?

rajesh00001 wrote:

...Here i am looking how to fix "DR status: Partitioned" issue.

I'll restate my original questions: "...From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?..."
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

rajesh00001

Posted: Sat Feb 29, 2020 9:13 pm Post subject:

Apprentice

Joined: 08 Sep 2009
Posts: 35

exerk wrote:

Yes, servers are communicating each other. I have other queue manager on the same servers and those queue managers are working without issues.

exerk wrote:

Is that statement referring to before the partition state became apparent, or after the partition state became apparent?

After the partition state apparent. I don't when the status changed from "normal" to "partition"

exerk

Posted: Sun Mar 01, 2020 12:13 pm Post subject:

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

You are making it difficult to help you because you do not give full information with each post...

rajesh00001 wrote:

Yes, servers are communicating each other. I have other queue manager on the same servers and those queue managers are working without issues.

1. Are those queue managers also DR-RDQM queue managers? If so, are their counterpart instances on the same server as the other partitioned instance of the queue manager you are having issues with?

2. If queue managers on that server are DR-RDQM, do they each have their own dedicated network interface for replication? If so, have you tested that you can establish communication between the partitioned instances using that specific interface, or are you assuming that communication is OK because the others are OK?

rajesh00001 wrote:

After the partition state apparent. I don't when the status changed from "normal" to "partition"

This statement is meaningless! Did you, or did you not, at some time, have BOTH instances of the queue manager running on each server WITHOUT having demoted one and promoted the other?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

hughson

Posted: Sun Mar 01, 2020 9:56 pm Post subject: Re: DR status is Partitioned on Primary node in RDQM DR

Padawan

Joined: 09 May 2013
Posts: 1967
Location: Bay of Plenty, New Zealand

rajesh00001 wrote:

[mqm@10.0.0.1] /var/mqm/errors $ sudo rdqmstatus -m TEST
Queue manager status: Ended normally
Queue manager file system: 148MB used, 2.9GB allocated [5%]
DR role: Primary
DR status: Partitioned
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.1
DR remote IP address: 10.0.0.2

I see that you have chosen to use Asynchronous rather than Synchronous replication. Are you aware of this:-

IBM Knowledge Center wrote:

You can choose between synchronous and asynchronous replication of data between primary and secondary queue managers. If you select the asynchronous option, operations such as IBM MQ PUT or GET complete and return to the application before the event is replicated to the secondary queue manager. Asynchronous replication means that, following a recovery situation, some messaging data might be lost. But the secondary queue manager will be in a consistent state, and able to start running immediately, even if it is started at a slightly earlier part of the message stream.

w.r.t. your partitioned state, what do the following logs show?

Queue Manager error log
crm status
systemctl status pacemaker
systemctl status corosync
/var/log/messages (will show same as above)
/var/log/pacemaker.log

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

rajesh00001

Posted: Thu Mar 12, 2020 9:55 am Post subject:

Apprentice

Joined: 08 Sep 2009
Posts: 35

@Ererk,

Sorry for the late response.
I have 4 queue managers with RDQM DR setup.
WPC node1 having 4 queue managers and NPC node 1 having 4 queue managers .
Out of 4 queue managers 1 queue manager DR Status in Normal and 3 queue managers DR Status is partition.

@Hughson,

I will agree with IBM note.
But I tested with below senario
loaded queue with 10k messages. after 5 minutes did a DR fail-over. after the DR fail-over verified the queue and didn't see any messages on queue. lost all 10K messages.

[@wpcnode1] /var/log $ crm status
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected

wpcnode1] /var/log $ systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html

wpcnode1] /var/log $ systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:corosync
man:corosync.conf
man:corosync_overview

NPC:
[npcnode1] /var/mqm/ $ crm status
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected
[npcnode1] /var/mqm/ $ systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html

[npdnode1] /var/mqm/ $ systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:corosync
man:corosync.conf
man:corosync_overview

hughson

Posted: Thu Mar 12, 2020 10:28 pm Post subject:

Padawan

Joined: 09 May 2013
Posts: 1967
Location: Bay of Plenty, New Zealand

Can you confirm that your RDQM system is correctly set up before you do the failover? It's rather odd to see that you have the same error

Code:

crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected

on both nodes.

What does rdqmstatus show on both nodes after you set them up, and put messages, but before you do the failover?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » DR status is Partitioned on Primary node in RDQM DR

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP