ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » DR status is Partitioned on Primary node in RDQM DR

Post new topic  Reply to topic
 DR status is Partitioned on Primary node in RDQM DR « View previous topic :: View next topic » 
Author Message
rajesh00001
PostPosted: Fri Feb 28, 2020 10:28 am    Post subject: DR status is Partitioned on Primary node in RDQM DR Reply with quote

Apprentice

Joined: 08 Sep 2009
Posts: 34

Hi Team,

We have RDQM DR environment with asynchronous messaging .

We did a DR test. During the DR test we placed some message on queue and after fail-over to Backup Data center we lost those messages.

Followed the process as said in IBM from below page, but still the same issue.
https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.con.doc/q133050_.htm?view=embed


Queue manager is shutdown on both servers. Below is the status on both datacenters

On primary node DR Status is Partitioned
On Secondary node DR Status is Remote unavailable.

[mqm@10.0.0.1] /var/mqm/errors $ sudo rdqmstatus -m TEST
Queue manager status: Ended normally
Queue manager file system: 148MB used, 2.9GB allocated [5%]
DR role: Primary
DR status: Partitioned
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.1
DR remote IP address: 10.0.0.2

[mqm@10.0.0.2] /var/mqm/ $ sudo rdqmstatus -m TEST
Queue manager status: Ended immediately
DR role: Secondary
DR status: Remote unavailable
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.2
DR remote IP address: 10.0.0.1
DR out of sync data: 3145596KB


Can you please me to fix the problem.
Back to top
View user's profile Send private message
exerk
PostPosted: Fri Feb 28, 2020 12:12 pm    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

According to the KC, for there to be a Partitioned status:

1. Either both queue manager instances must have been running concurrently, and you promoted the secondary to be the primary, whilst the 'old' primary instance was still running;

Or:

2. The queue manager instances have been started on both nodes while the DR replication network is unavailable whilst the primary was running.

From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?

And lastly, irrespective of any of the above, were the test messages persistent or non-persistent? If the latter, what was the NPMCLASS of the queue(s) set to?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
rajesh00001
PostPosted: Fri Feb 28, 2020 2:16 pm    Post subject: Reply with quote

Apprentice

Joined: 08 Sep 2009
Posts: 34

Didn't started queue manager on both primary and secondary

Yes, message is persistence. queue is peristence.

NPMCLASS is normal
Back to top
View user's profile Send private message
bruce2359
PostPosted: Fri Feb 28, 2020 3:04 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

rajesh00001 wrote:
...message is persistence.

How did you determine that the message was persistent? Did you view the message in the queue, and look at the message persistence field? Or, did you look at the queue attribute DEFPSIST? Or, something else?

rajesh00001 wrote:
queue is peristence.

MQ Queues are neither persistent nor non-persistent.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
rajesh00001
PostPosted: Fri Feb 28, 2020 3:39 pm    Post subject: Reply with quote

Apprentice

Joined: 08 Sep 2009
Posts: 34

opened message using mq explorer and verified the attribute.

Here i am looking how to fix "DR status: Partitioned" issue.
Back to top
View user's profile Send private message
exerk
PostPosted: Sat Feb 29, 2020 4:23 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

rajesh00001 wrote:
Didn't started queue manager on both primary and secondary...

Is that statement referring to before the partition state became apparent, or after the partition state became apparent?

rajesh00001 wrote:
...Here i am looking how to fix "DR status: Partitioned" issue.

I'll restate my original questions: "...From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?..."
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
rajesh00001
PostPosted: Sat Feb 29, 2020 9:13 pm    Post subject: Reply with quote

Apprentice

Joined: 08 Sep 2009
Posts: 34

exerk wrote:
I'll restate my original questions: "...From your above post I assume that you tried to resolve the partitioned state but were unable to do so - is that correct? And as the secondary is reporting Remote unavailable, have you confirmed that the servers can actually communicate?..."

Yes, servers are communicating each other. I have other queue manager on the same servers and those queue managers are working without issues.

exerk wrote:
Is that statement referring to before the partition state became apparent, or after the partition state became apparent?

After the partition state apparent. I don't when the status changed from "normal" to "partition"
Back to top
View user's profile Send private message
exerk
PostPosted: Sun Mar 01, 2020 12:13 pm    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

You are making it difficult to help you because you do not give full information with each post...

rajesh00001 wrote:
Yes, servers are communicating each other. I have other queue manager on the same servers and those queue managers are working without issues.

1. Are those queue managers also DR-RDQM queue managers? If so, are their counterpart instances on the same server as the other partitioned instance of the queue manager you are having issues with?

2. If queue managers on that server are DR-RDQM, do they each have their own dedicated network interface for replication? If so, have you tested that you can establish communication between the partitioned instances using that specific interface, or are you assuming that communication is OK because the others are OK?

rajesh00001 wrote:
After the partition state apparent. I don't when the status changed from "normal" to "partition"

This statement is meaningless! Did you, or did you not, at some time, have BOTH instances of the queue manager running on each server WITHOUT having demoted one and promoted the other?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
hughson
PostPosted: Sun Mar 01, 2020 9:56 pm    Post subject: Re: DR status is Partitioned on Primary node in RDQM DR Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

rajesh00001 wrote:
[mqm@10.0.0.1] /var/mqm/errors $ sudo rdqmstatus -m TEST
Queue manager status: Ended normally
Queue manager file system: 148MB used, 2.9GB allocated [5%]
DR role: Primary
DR status: Partitioned
DR type: Asynchronous
DR port: 1484
DR local IP address: 10.0.0.1
DR remote IP address: 10.0.0.2


I see that you have chosen to use Asynchronous rather than Synchronous replication. Are you aware of this:-

IBM Knowledge Center wrote:
You can choose between synchronous and asynchronous replication of data between primary and secondary queue managers. If you select the asynchronous option, operations such as IBM MQ PUT or GET complete and return to the application before the event is replicated to the secondary queue manager. Asynchronous replication means that, following a recovery situation, some messaging data might be lost. But the secondary queue manager will be in a consistent state, and able to start running immediately, even if it is started at a slightly earlier part of the message stream.


w.r.t. your partitioned state, what do the following logs show?
  • Queue Manager error log
  • crm status
  • systemctl status pacemaker
  • systemctl status corosync
  • /var/log/messages (will show same as above)
  • /var/log/pacemaker.log


Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
rajesh00001
PostPosted: Thu Mar 12, 2020 9:55 am    Post subject: Reply with quote

Apprentice

Joined: 08 Sep 2009
Posts: 34

@Ererk,

Sorry for the late response.
I have 4 queue managers with RDQM DR setup.
WPC node1 having 4 queue managers and NPC node 1 having 4 queue managers .
Out of 4 queue managers 1 queue manager DR Status in Normal and 3 queue managers DR Status is partition.


@Hughson,

I will agree with IBM note.
But I tested with below senario
loaded queue with 10k messages. after 5 minutes did a DR fail-over. after the DR fail-over verified the queue and didn't see any messages on queue. lost all 10K messages.

[@wpcnode1] /var/log $ crm status
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected

wpcnode1] /var/log $ systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html

wpcnode1] /var/log $ systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:corosync
man:corosync.conf
man:corosync_overview

NPC:
[npcnode1] /var/mqm/ $ crm status
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected
[npcnode1] /var/mqm/ $ systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:pacemakerd
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/index.html

[npdnode1] /var/mqm/ $ systemctl status corosync
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Back to top
View user's profile Send private message
hughson
PostPosted: Thu Mar 12, 2020 10:28 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

Can you confirm that your RDQM system is correctly set up before you do the failover? It's rather odd to see that you have the same error

Code:
crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected


on both nodes.

What does rdqmstatus show on both nodes after you set them up, and put messages, but before you do the failover?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » DR status is Partitioned on Primary node in RDQM DR
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.