Author |
Message
|
PeterPotkay |
Posted: Mon Feb 18, 2019 5:28 pm Post subject: MQ Clusters and internal sequence numbers |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
|
Back to top |
|
 |
tczielke |
Posted: Tue Feb 19, 2019 9:10 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
Thank you for pointing out that article. That is very helpful. A few thoughts come to my mind when reading that article.
1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?
2) The real life situations documented here show the need for a REFRESH CLUSTER command that is not disruptive (or at least tries to not be disruptive). For example, a REFRESH CLUSTER that will enqueue for an exclusive opening to refresh the cluster and block any subsequent MQOPEN calls until the REFRESH CLUSTER completes. I know this is an oversimplification, but the main gist is I would rather have my MQOPEN calls delay during a REFRESH CLUSTER and wait for the command to complete, than return immediately but with an error to the application.
3) If we had #2, would there then be an option for #1 for the FR to just tell the PR to do a REFRESH CLUSTER because it is sending out of date sequence numbers and if the PR then determines that this is due to out of date information it has for the cluster. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
mvic |
Posted: Tue Feb 19, 2019 9:34 am Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
tczielke wrote: |
1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster? |
I think there is an analogy with TCP packet sequence numbers: https://stackoverflow.com/questions/141128/does-tcp-ip-prevent-packet-replays
However TCP/IP stacks do not persist their sequence numbers (I would assume!) and even if they do, are not restored from backup, so would not be susceptible to using stale info from the backup. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Feb 19, 2019 1:51 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
tczielke wrote: |
1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?
|
Hmm, makes me think that this is either an oversight (heck, record this situation in both the FR's and PR's log with a unique AMQ**** code) or out of order internal updates happen often enough and are usually harmless enough that its not worth recording as an error.  _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Feb 19, 2019 1:57 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I go back to this section of the article, that explains how REFRESH CLUSTER can help
Quote: |
Why does REFRESH CLUSTER help resolve this problem?
There’s a quite a lot of processing within REFRESH CLUSTER, but the most important part for our consideration is the update to the sequence numbers.
REFRESH CLUSTER updates all the sequence numbers on the clustered queues, topics and CLUSRCVR channels owned by the queue manager where you run the command.
The sequence numbers will be set to the current Unix Epoch time, as given by the local system clock. The Unix Epoch time is the number of seconds since 00:00:00 on 1 January 1970 UTC. So, this number increases once per second, forever.
When the queues and CLUSRCVR channels have their new sequence number, they are re-published immediately to the Full Repositories.
The new sequence number (now set to the current Unix Epoch time) will be higher than the sequence number the other queue managers have stored, so they will not discard these updates.
So, by running REFRESH CLUSTER on the restored queue manager, you will have allowed its updates to be accepted again. |
So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time, thus avoiding the whole issue. If the PR is always publishing with the largest possible sequence number, wouldn't it always ensure its latest update is treated as such by everyone else in the cluster? Even more
 _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
tczielke |
Posted: Tue Feb 19, 2019 2:03 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
I don't understand all the clustering internals, but it seems for the clustering issue where a PR gets out of date (i.e. recovered from a back up), there is an opportunity here for the MQ cluster code to self-correct the issue with out the administrator having to get involved. That would at least be the ideal way to do it. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Feb 19, 2019 8:54 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
PeterPotkay wrote: |
I go back to this section of the article, that explains how REFRESH CLUSTER can help
...............
So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time, thus avoiding the whole issue. If the PR is always publishing with the largest possible sequence number, wouldn't it always ensure its latest update is treated as such by everyone else in the cluster? Even more
 |
And I speculate here that it could be to avoid accidentally polluting the cluster with information that will be transient. (Short time the DR will operate before Prod is used again).  _________________ MQ & Broker admin |
|
Back to top |
|
 |
mvic |
Posted: Wed Feb 20, 2019 1:57 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
PeterPotkay wrote: |
So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time |
Probably this would work out fine, but you'd be restricted to making one ALTER on any one object a maximum of once per second because the epoch time only increases by one per second. |
|
Back to top |
|
 |
tczielke |
Posted: Wed Feb 20, 2019 2:20 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
mvic wrote: |
PeterPotkay wrote: |
So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time |
Probably this would work out fine, but you'd be restricted to making one ALTER on any one object a maximum of once per second because the epoch time only increases by one per second. |
On Unix you do have data structures to track time more granular than seconds, but it does look like IBM MQ is doing it at the second level.
Here is a sequence number for one of my 9.1 cluster queues:
QLOCAL(TCZ.CLUSQ.103018) Seq(1543865176)
If you do the math that comes out to around 48 years, which would be tracking it at seconds since Unix thinks time began on January 1, 1970 UTC. A little bit or revisionist history if you ask me.  _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Feb 21, 2019 8:19 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
tczielke wrote: |
I don't understand all the clustering internals, but it seems for the clustering issue where a PR gets out of date (i.e. recovered from a back up), there is an opportunity here for the MQ cluster code to self-correct the issue with out the administrator having to get involved. That would at least be the ideal way to do it. |
When is 'out of date'?
Read How long do the queue manager repositories retain information?: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q017340_.htm _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
tczielke |
Posted: Thu Feb 21, 2019 9:01 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
bruce2359 wrote: |
tczielke wrote: |
I don't understand all the clustering internals, but it seems for the clustering issue where a PR gets out of date (i.e. recovered from a back up), there is an opportunity here for the MQ cluster code to self-correct the issue with out the administrator having to get involved. That would at least be the ideal way to do it. |
When is 'out of date'?
Read How long do the queue manager repositories retain information?: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q017340_.htm |
"Out of date" would be the scenario that the original link was talking about where a PR has been recovered from some point in the past, and it is no longer functioning properly because its sequence numbers are too "out of date." _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
hughson |
Posted: Thu Feb 21, 2019 10:31 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
PeterPotkay wrote: |
tczielke wrote: |
1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?
|
Hmm, makes me think that this is either an oversight (heck, record this situation in both the FR's and PR's log with a unique AMQ**** code) or out of order internal updates happen often enough and are usually harmless enough that its not worth recording as an error.  |
The way a PR is kicked out if a cluster (ACTION (FORCEREMOVE)) is by bumping it's sequence number on the FR so that all its updates are ignored. If the FR told it, then this wouldn't work.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Feb 21, 2019 11:48 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Tim: How do ‘out of date’ and ‘too out of date’ differ?
How would an FR determine out of date-ness? Would one stale PR def force all subscribed PR defs for this qmgr to be updated? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
tczielke |
Posted: Thu Feb 21, 2019 12:08 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
bruce2359 wrote: |
Tim: How do ‘out of date’ and ‘too out of date’ differ?
How would an FR determine out of date-ness? Would one stale PR def force all subscribed PR defs for this qmgr to be updated? |
That is for me to know and IBM to find out.
Seriously, I don't understand the cluster internals well enough to answer a question like that. However, at a high level, I don't see why the PRs and FRs can't have some way of keeping themselves in synch where the "out of date" PR scenario can't be detected and then corrected automatically by the internal cluster code. Kind of like how cluster channels can automatically clear channel sequence issues without the administrator needing to get involved. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
tczielke |
Posted: Thu Feb 21, 2019 1:21 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
One other reminder for MQ administrators when working with cluster commands.
https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.con.doc/q021225_.htm
Quote: |
To have confidence that these commands have finished, check that the expected objects exist on the remote queue managers. |
For the backup/restore scenario in that original link, if the MQ administrator was validating that his/her cluster changes on the restored PR was being reflected in the cluster (as they should be per the guidance), they would know right away that their changes are not taking and something is wrong with the restored PR from a clustering stand point. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
|