Author |
Message
|
mqdev |
Posted: Sun Jan 05, 2020 10:53 am Post subject: RDQM data replication question |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Hello All
Have the following questions regarding "data replication" amongst the RDQM nodes:
Lets assume A, B, C are RDQM nodes and A has a QM Active (QM status=Running)
1. After MQPUT() is successful (rc=0), does the data written to Q currently active on A, instanteneously available in other nodes B & C? What happens if A crashes right after above MQPUT() returns rc=0? In other words, does MQPUT() return after the data is written to the "drbd cluster" (i.e. available on all A, B and C) OR just after the data is written to A and there is a small window wherein the data ** not available ** on B & C?
This question arises because of 2 factors:
a) The rate at which data is enqueued to a Q ( depends on App writing data)
b) The rate at which data is replicated in drbd cluster (depends on corosync component of drbd stack)
It is highly likely the above rates are different - so it follows there could be a possibility (small window) that data is out-of-sync and hence data loss should the node having the latest data crashes before the sync to other nodes completes....
2. Lets say an App enqueues a large data to a node (say A) and as soon as the last byte is written successfully ( MQPUT() returns rc=0) ), if A crashes (and says down indefinitely), how soon is this 50 Gb availble on B or C (post QM failover)? Due to the above differental rates for data enqueuing and data replication in drbd cluster, is it possible we could lose data in RDQM (permanently should A remain down indefinitely)? Assume the Q has NPMCLASS=High (otherwise non-persistent data is lost during failover - we have already tested and confirmed this; however with NPMCLASS=High, there is no data loss)
Thanks
-mqdev |
|
Back to top |
|
 |
hughson |
Posted: Sun Jan 05, 2020 8:24 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
You don't mention syncpoint in your question and only at the end of the question do you mention persistence. Can you confirm what you are using for both of these.
Remember a non-persistent message could bypass the queue entirely and be placed directly in the getters buffer, in which case I doubt it would be replicated.
The safety and integrity of your transactional messages is assured when using MQ with or without RDQM.
If you are interested in something other than that, could you expand on what your interest is.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
mqdev |
Posted: Mon Jan 06, 2020 7:50 am Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
thanks for your response Morag.
We have Apps all over the place as regarding the syncpoint and Qs all over the place as regarding the persistence. We have close to 5000 Apps - not all of them uniform as regards syncpoint and persistence.
This question came up when we were doing the due deligence for rolling out RDQM.
The question raised was - with data replication going on under the covers for RDQM, how do we know there will be no data loss under any scenario. My above questions need viewed in the context of "data loss" only. I have already reported the finding regarding NPMCLASS. We have already made a policy decision to set NPMCLASS=High across the board for RDQM to prevent data loss.
I would like to be able to tell to my 5000 Customers - if you are doing syncpoint - here is what to expect with RDQM. If you are not Persisting the msg at MQPUT() - here is the implication (this could very well be the same whether RDQM or no-RDQM)
However am not clear how the variable rate for "data replication" and "data enqueing" which is the critical factor specific only for RDQM, plays out in reality - hence this question.
Thanks
-mqdev |
|
Back to top |
|
 |
mqdev |
Posted: Mon Jan 06, 2020 7:59 am Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
to add a bit more:
Lets say an App is doing MQPUT() under syncpoint to QM on node A. commit has been done and rc=0 (msg has been successfully written to A). The way drbd/RDQM work is - corosync under the covers, will write this data to nodes B & C. However, if A crashes immediately after the MQPUT() call in above scenarios returns rc=0, corosync has not get chance to sync this message data to B & C. How does RDQM gurantee this data is available on B & C?
Possibility 1:
MQPUT() on A under syncpoint will not return rc=0 until data is written to B & C as well.....if this is the case, then with RDQM there could be small runtime performance hit in that MQPUT() ( and MQGET() as well) will take a bit longer then standalone QM - since there is time needed to sync data between the other nodes in RDQM...
Possibility 2:
On the other hand, if corosync takes the responsibility to sync data outside the perview of MQPUT() call, then if A crashes immediately after MQPUT() returns, then there is a possibility of data loss with RDQM....
We would like to know which above is true with RDQM... |
|
Back to top |
|
 |
rekarm01 |
Posted: Mon Jan 06, 2020 7:43 pm Post subject: Re: RDQM data replication question |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
mqdev wrote: |
We have ... Qs all over the place as regarding the persistence. |
That should be: " ... messages all over the place ... ". Messages can either be persistent, or non-persistent, regardless of whatever queue attributes are set. The application that puts the message decides that.
Messages put/get outside of syncpoint are subject to data loss or duplication, with or without RDQM.
Non-persistent messages are subject to data loss, with or without RDQM, even when NPMCLASS=High.
mqdev wrote: |
Possibility 1:
MQPUT() on A under syncpoint will not return rc=0 until data is written to B & C as well ... |
The Knowledge Center states that the running instance of the queue manager "synchronously replicates its data to the other two instances". That would seem to require that any replication occurs before each MQPUT/MQGET/MQCMIT/MQBACK call returns.
However, RDQM is still vulnerable to split-brain situations, which can result in data loss or duplication. |
|
Back to top |
|
 |
Andyh |
Posted: Tue Jan 07, 2020 5:33 am Post subject: |
|
|
Master
Joined: 29 Jul 2010 Posts: 239
|
NPMCLASS(HIGH) only has ANY effect if the QMgr is successfully shutdown in a controlled manner.
Even when non-persistent messages spill from in memory buffers to disk they are not chained in to the message chains for the queue.
During a controlled shutdown, MQ will attempt to write any queue buffers relating to NPMCLASS(HIGH) messages to disk and will then attempt to chain those messages together in order that they could be recovered when the QMgr restarts.
In the event of a queue manager crash, all of the NPMCLASS(HIGH) messages will typically be lost.
NPMCLASS(HIGH) would be a very odd thing to be using in combination with RDQM. RDQM suggests that you want extra protection over a more traditional IO subsystem, while NPMCLASS(HIGH) offers very little guarantee in the event of any unexpected outage. |
|
Back to top |
|
 |
mqdev |
Posted: Wed Jan 08, 2020 7:28 am Post subject: |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
Andyh wrote: |
NPMCLASS(HIGH) only has ANY effect if the QMgr is successfully shutdown in a controlled manner.
..
..
|
RDQM failover is
1. controlled shutdown on current Active Node
2. startup on one of the erstwhile Secondary nodes
In that order...during our testing, if NPMCLASS is not set as HIGH, messages on the Qs were lost , post failover. Offcourse the messages persistence is still goverened by the Persistence setting - but if you have a set of Apps being used to a certain way of functioning (not loosing messages when the QM fails over) and start losing messages just because the QM was migrated to RDQM, you can imagine the consternation in your App Customers. So the take away for us has been to set NPMCLASS as high on all Qs hosted by the QM, when migrating the QM to RDQM |
|
Back to top |
|
 |
mqdev |
Posted: Wed Jan 08, 2020 7:32 am Post subject: Re: RDQM data replication question |
|
|
Centurion
Joined: 21 Jan 2003 Posts: 136
|
rekarm01 wrote: |
..
..
mqdev wrote: |
Possibility 1:
MQPUT() on A under syncpoint will not return rc=0 until data is written to B & C as well ... |
The Knowledge Center states that the running instance of the queue manager "synchronously replicates its data to the other two instances". That would seem to require that any replication occurs before each MQPUT/MQGET/MQCMIT/MQBACK call returns.
However, RDQM is still vulnerable to split-brain situations, which can result in data loss or duplication. |
Perfect...thanks for that clarification...perfectly answers my question! |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Jan 08, 2020 9:51 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
mqdev wrote: |
Andyh wrote: |
NPMCLASS(HIGH) only has ANY effect if the QMgr is successfully shutdown in a controlled manner.
..
..
|
RDQM failover is
1. controlled shutdown on current Active Node
2. startup on one of the erstwhile Secondary nodes
In that order...during our testing, if NPMCLASS is not set as HIGH, messages on the Qs were lost , post failover. Offcourse the messages persistence is still goverened by the Persistence setting - but if you have a set of Apps being used to a certain way of functioning (not loosing messages when the QM fails over) and start losing messages just because the QM was migrated to RDQM, you can imagine the consternation in your App Customers. So the take away for us has been to set NPMCLASS as high on all Qs hosted by the QM, when migrating the QM to RDQM |
Forget about NPMCLASS. It only helps you if the stars align and everything is shutdown nicely and cleanly. Its benefit (real or imagined) is exactly the same for your queue manager before RDQM and after RDQM.
If you want messages to survive, the application needs to set the message to be persistent when its put. Do not look to the queue's DEFPIST or NPMCLASS to help you - you cannot rely on these to help you if the application insists on putting non persistent message outside of syncpoint. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Jan 08, 2020 1:29 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
regarding both NPMCLASS and setting message persistence explicitly in the MQMD. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
HubertKleinmanns |
Posted: Thu Jan 09, 2020 11:28 pm Post subject: |
|
|
 Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
A possible use case for NPMCLASS are statistics messages. This would have severals benefits:
- Transfer messages much faster than persistent messages (assuming lots of statistics traffic).
- Don't loose any data on clean shutdown, for example during maintenance tasks.
On the other Hand when the QMgr crashes, you may loose some messages - which would lead to a gap in the statistics data, but could be acceptable in this case. _________________ Regards
Hubert |
|
Back to top |
|
 |
|