ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » RDQM data replication question

Post new topic  Reply to topic
 RDQM data replication question « View previous topic :: View next topic » 
Author Message
mqdev
PostPosted: Sun Jan 05, 2020 10:53 am    Post subject: RDQM data replication question Reply with quote

Centurion

Joined: 21 Jan 2003
Posts: 136

Hello All
Have the following questions regarding "data replication" amongst the RDQM nodes:

Lets assume A, B, C are RDQM nodes and A has a QM Active (QM status=Running)

1. After MQPUT() is successful (rc=0), does the data written to Q currently active on A, instanteneously available in other nodes B & C? What happens if A crashes right after above MQPUT() returns rc=0? In other words, does MQPUT() return after the data is written to the "drbd cluster" (i.e. available on all A, B and C) OR just after the data is written to A and there is a small window wherein the data ** not available ** on B & C?

This question arises because of 2 factors:
a) The rate at which data is enqueued to a Q ( depends on App writing data)
b) The rate at which data is replicated in drbd cluster (depends on corosync component of drbd stack)
It is highly likely the above rates are different - so it follows there could be a possibility (small window) that data is out-of-sync and hence data loss should the node having the latest data crashes before the sync to other nodes completes....

2. Lets say an App enqueues a large data to a node (say A) and as soon as the last byte is written successfully ( MQPUT() returns rc=0) ), if A crashes (and says down indefinitely), how soon is this 50 Gb availble on B or C (post QM failover)? Due to the above differental rates for data enqueuing and data replication in drbd cluster, is it possible we could lose data in RDQM (permanently should A remain down indefinitely)? Assume the Q has NPMCLASS=High (otherwise non-persistent data is lost during failover - we have already tested and confirmed this; however with NPMCLASS=High, there is no data loss)

Thanks
-mqdev
Back to top
View user's profile Send private message
hughson
PostPosted: Sun Jan 05, 2020 8:24 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

You don't mention syncpoint in your question and only at the end of the question do you mention persistence. Can you confirm what you are using for both of these.

Remember a non-persistent message could bypass the queue entirely and be placed directly in the getters buffer, in which case I doubt it would be replicated.

The safety and integrity of your transactional messages is assured when using MQ with or without RDQM.

If you are interested in something other than that, could you expand on what your interest is.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
mqdev
PostPosted: Mon Jan 06, 2020 7:50 am    Post subject: Reply with quote

Centurion

Joined: 21 Jan 2003
Posts: 136

thanks for your response Morag.

We have Apps all over the place as regarding the syncpoint and Qs all over the place as regarding the persistence. We have close to 5000 Apps - not all of them uniform as regards syncpoint and persistence.

This question came up when we were doing the due deligence for rolling out RDQM.

The question raised was - with data replication going on under the covers for RDQM, how do we know there will be no data loss under any scenario. My above questions need viewed in the context of "data loss" only. I have already reported the finding regarding NPMCLASS. We have already made a policy decision to set NPMCLASS=High across the board for RDQM to prevent data loss.

I would like to be able to tell to my 5000 Customers - if you are doing syncpoint - here is what to expect with RDQM. If you are not Persisting the msg at MQPUT() - here is the implication (this could very well be the same whether RDQM or no-RDQM)

However am not clear how the variable rate for "data replication" and "data enqueing" which is the critical factor specific only for RDQM, plays out in reality - hence this question.

Thanks
-mqdev
Back to top
View user's profile Send private message
mqdev
PostPosted: Mon Jan 06, 2020 7:59 am    Post subject: Reply with quote

Centurion

Joined: 21 Jan 2003
Posts: 136

to add a bit more:

Lets say an App is doing MQPUT() under syncpoint to QM on node A. commit has been done and rc=0 (msg has been successfully written to A). The way drbd/RDQM work is - corosync under the covers, will write this data to nodes B & C. However, if A crashes immediately after the MQPUT() call in above scenarios returns rc=0, corosync has not get chance to sync this message data to B & C. How does RDQM gurantee this data is available on B & C?

Possibility 1:
MQPUT() on A under syncpoint will not return rc=0 until data is written to B & C as well.....if this is the case, then with RDQM there could be small runtime performance hit in that MQPUT() ( and MQGET() as well) will take a bit longer then standalone QM - since there is time needed to sync data between the other nodes in RDQM...

Possibility 2:
On the other hand, if corosync takes the responsibility to sync data outside the perview of MQPUT() call, then if A crashes immediately after MQPUT() returns, then there is a possibility of data loss with RDQM....

We would like to know which above is true with RDQM...
Back to top
View user's profile Send private message
rekarm01
PostPosted: Mon Jan 06, 2020 7:43 pm    Post subject: Re: RDQM data replication question Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

mqdev wrote:
We have ... Qs all over the place as regarding the persistence.

That should be: " ... messages all over the place ... ". Messages can either be persistent, or non-persistent, regardless of whatever queue attributes are set. The application that puts the message decides that.

Messages put/get outside of syncpoint are subject to data loss or duplication, with or without RDQM.

Non-persistent messages are subject to data loss, with or without RDQM, even when NPMCLASS=High.

mqdev wrote:
Possibility 1:
MQPUT() on A under syncpoint will not return rc=0 until data is written to B & C as well ...

The Knowledge Center states that the running instance of the queue manager "synchronously replicates its data to the other two instances". That would seem to require that any replication occurs before each MQPUT/MQGET/MQCMIT/MQBACK call returns.

However, RDQM is still vulnerable to split-brain situations, which can result in data loss or duplication.
Back to top
View user's profile Send private message
Andyh
PostPosted: Tue Jan 07, 2020 5:33 am    Post subject: Reply with quote

Master

Joined: 29 Jul 2010
Posts: 237

NPMCLASS(HIGH) only has ANY effect if the QMgr is successfully shutdown in a controlled manner.
Even when non-persistent messages spill from in memory buffers to disk they are not chained in to the message chains for the queue.
During a controlled shutdown, MQ will attempt to write any queue buffers relating to NPMCLASS(HIGH) messages to disk and will then attempt to chain those messages together in order that they could be recovered when the QMgr restarts.
In the event of a queue manager crash, all of the NPMCLASS(HIGH) messages will typically be lost.

NPMCLASS(HIGH) would be a very odd thing to be using in combination with RDQM. RDQM suggests that you want extra protection over a more traditional IO subsystem, while NPMCLASS(HIGH) offers very little guarantee in the event of any unexpected outage.
Back to top
View user's profile Send private message
mqdev
PostPosted: Wed Jan 08, 2020 7:28 am    Post subject: Reply with quote

Centurion

Joined: 21 Jan 2003
Posts: 136

Andyh wrote:
NPMCLASS(HIGH) only has ANY effect if the QMgr is successfully shutdown in a controlled manner.
..
..


RDQM failover is
1. controlled shutdown on current Active Node
2. startup on one of the erstwhile Secondary nodes

In that order...during our testing, if NPMCLASS is not set as HIGH, messages on the Qs were lost , post failover. Offcourse the messages persistence is still goverened by the Persistence setting - but if you have a set of Apps being used to a certain way of functioning (not loosing messages when the QM fails over) and start losing messages just because the QM was migrated to RDQM, you can imagine the consternation in your App Customers. So the take away for us has been to set NPMCLASS as high on all Qs hosted by the QM, when migrating the QM to RDQM
Back to top
View user's profile Send private message
mqdev
PostPosted: Wed Jan 08, 2020 7:32 am    Post subject: Re: RDQM data replication question Reply with quote

Centurion

Joined: 21 Jan 2003
Posts: 136

rekarm01 wrote:
..
..
mqdev wrote:
Possibility 1:
MQPUT() on A under syncpoint will not return rc=0 until data is written to B & C as well ...

The Knowledge Center states that the running instance of the queue manager "synchronously replicates its data to the other two instances". That would seem to require that any replication occurs before each MQPUT/MQGET/MQCMIT/MQBACK call returns.

However, RDQM is still vulnerable to split-brain situations, which can result in data loss or duplication.


Perfect...thanks for that clarification...perfectly answers my question!
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Wed Jan 08, 2020 9:51 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7717

mqdev wrote:
Andyh wrote:
NPMCLASS(HIGH) only has ANY effect if the QMgr is successfully shutdown in a controlled manner.
..
..


RDQM failover is
1. controlled shutdown on current Active Node
2. startup on one of the erstwhile Secondary nodes

In that order...during our testing, if NPMCLASS is not set as HIGH, messages on the Qs were lost , post failover. Offcourse the messages persistence is still goverened by the Persistence setting - but if you have a set of Apps being used to a certain way of functioning (not loosing messages when the QM fails over) and start losing messages just because the QM was migrated to RDQM, you can imagine the consternation in your App Customers. So the take away for us has been to set NPMCLASS as high on all Qs hosted by the QM, when migrating the QM to RDQM




Forget about NPMCLASS. It only helps you if the stars align and everything is shutdown nicely and cleanly. Its benefit (real or imagined) is exactly the same for your queue manager before RDQM and after RDQM.

If you want messages to survive, the application needs to set the message to be persistent when its put. Do not look to the queue's DEFPIST or NPMCLASS to help you - you cannot rely on these to help you if the application insists on putting non persistent message outside of syncpoint.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
bruce2359
PostPosted: Wed Jan 08, 2020 1:29 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

regarding both NPMCLASS and setting message persistence explicitly in the MQMD.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
HubertKleinmanns
PostPosted: Thu Jan 09, 2020 11:28 pm    Post subject: Reply with quote

Shaman

Joined: 24 Feb 2004
Posts: 732
Location: Germany

A possible use case for NPMCLASS are statistics messages. This would have severals benefits:

- Transfer messages much faster than persistent messages (assuming lots of statistics traffic).

- Don't loose any data on clean shutdown, for example during maintenance tasks.

On the other Hand when the QMgr crashes, you may loose some messages - which would lead to a gap in the statistics data, but could be acceptable in this case.
_________________
Regards
Hubert
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » RDQM data replication question
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.