MQSeries.net :: View topic - MQ Clusters and internal sequence numbers

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » MQ Clusters and internal sequence numbers

Goto page 1, 2 Next

MQ Clusters and internal sequence numbers

« View previous topic :: View next topic »

Author

Message

PeterPotkay

Posted: Mon Feb 18, 2019 5:28 pm Post subject: MQ Clusters and internal sequence numbers

Poobah

Joined: 15 May 2001
Posts: 7723

This is a really good article if you are interested in how clusters work under the covers. It explains the internal sequence numbers the cluster uses.

https://developer.ibm.com/messaging/2019/02/15/avoiding-mq-cluster-problems-after-disaster-recovery-testing/#comment-43209
_________________
Peter Potkay
Keep Calm and MQ On

tczielke

Posted: Tue Feb 19, 2019 9:10 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

Thank you for pointing out that article. That is very helpful. A few thoughts come to my mind when reading that article.

1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?

2) The real life situations documented here show the need for a REFRESH CLUSTER command that is not disruptive (or at least tries to not be disruptive). For example, a REFRESH CLUSTER that will enqueue for an exclusive opening to refresh the cluster and block any subsequent MQOPEN calls until the REFRESH CLUSTER completes. I know this is an oversimplification, but the main gist is I would rather have my MQOPEN calls delay during a REFRESH CLUSTER and wait for the command to complete, than return immediately but with an error to the application.

3) If we had #2, would there then be an option for #1 for the FR to just tell the PR to do a REFRESH CLUSTER because it is sending out of date sequence numbers and if the PR then determines that this is due to out of date information it has for the cluster.
_________________
Working with MQ since 2010.

mvic

Posted: Tue Feb 19, 2019 9:34 am Post subject:

Jedi

Joined: 09 Mar 2004
Posts: 2080

tczielke wrote:

1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?

I think there is an analogy with TCP packet sequence numbers: https://stackoverflow.com/questions/141128/does-tcp-ip-prevent-packet-replays
However TCP/IP stacks do not persist their sequence numbers (I would assume!) and even if they do, are not restored from backup, so would not be susceptible to using stale info from the backup.

PeterPotkay

Posted: Tue Feb 19, 2019 1:51 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

tczielke wrote:

Hmm, makes me think that this is either an oversight (heck, record this situation in both the FR's and PR's log with a unique AMQ**** code) or out of order internal updates happen often enough and are usually harmless enough that its not worth recording as an error.

_________________
Peter Potkay
Keep Calm and MQ On

PeterPotkay

Posted: Tue Feb 19, 2019 1:57 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

I go back to this section of the article, that explains how REFRESH CLUSTER can help

Quote:

Why does REFRESH CLUSTER help resolve this problem?
Thereâ€™s a quite a lot of processing within REFRESH CLUSTER, but the most important part for our consideration is the update to the sequence numbers.

REFRESH CLUSTER updates all the sequence numbers on the clustered queues, topics and CLUSRCVR channels owned by the queue manager where you run the command.

The sequence numbers will be set to the current Unix Epoch time, as given by the local system clock. The Unix Epoch time is the number of seconds since 00:00:00 on 1 January 1970 UTC. So, this number increases once per second, forever.

When the queues and CLUSRCVR channels have their new sequence number, they are re-published immediately to the Full Repositories.

The new sequence number (now set to the current Unix Epoch time) will be higher than the sequence number the other queue managers have stored, so they will not discard these updates.

So, by running REFRESH CLUSTER on the restored queue manager, you will have allowed its updates to be accepted again.

So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time, thus avoiding the whole issue. If the PR is always publishing with the largest possible sequence number, wouldn't it always ensure its latest update is treated as such by everyone else in the cluster? Even more

_________________
Peter Potkay
Keep Calm and MQ On

tczielke

Posted: Tue Feb 19, 2019 2:03 pm Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

I don't understand all the clustering internals, but it seems for the clustering issue where a PR gets out of date (i.e. recovered from a back up), there is an opportunity here for the MQ cluster code to self-correct the issue with out the administrator having to get involved. That would at least be the ideal way to do it.
_________________
Working with MQ since 2010.

fjb_saper

Posted: Tue Feb 19, 2019 8:54 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

PeterPotkay wrote:

I go back to this section of the article, that explains how REFRESH CLUSTER can help

...............

So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time, thus avoiding the whole issue. If the PR is always publishing with the largest possible sequence number, wouldn't it always ensure its latest update is treated as such by everyone else in the cluster? Even more

And I speculate here that it could be to avoid accidentally polluting the cluster with information that will be transient. (Short time the DR will operate before Prod is used again).

_________________
MQ & Broker admin

mvic

Posted: Wed Feb 20, 2019 1:57 pm Post subject:

Jedi

Joined: 09 Mar 2004
Posts: 2080

PeterPotkay wrote:

So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time

Probably this would work out fine, but you'd be restricted to making one ALTER on any one object a maximum of once per second because the epoch time only increases by one per second.

tczielke

Posted: Wed Feb 20, 2019 2:20 pm Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

mvic wrote:

PeterPotkay wrote:

So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time

Probably this would work out fine, but you'd be restricted to making one ALTER on any one object a maximum of once per second because the epoch time only increases by one per second.

On Unix you do have data structures to track time more granular than seconds, but it does look like IBM MQ is doing it at the second level.

Here is a sequence number for one of my 9.1 cluster queues:

QLOCAL(TCZ.CLUSQ.103018) Seq(1543865176)

If you do the math that comes out to around 48 years, which would be tracking it at seconds since Unix thinks time began on January 1, 1970 UTC. A little bit or revisionist history if you ask me.

_________________
Working with MQ since 2010.

bruce2359

Posted: Thu Feb 21, 2019 8:19 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

tczielke wrote:

When is 'out of date'?

Read How long do the queue manager repositories retain information?: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q017340_.htm
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

tczielke

Posted: Thu Feb 21, 2019 9:01 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

bruce2359 wrote:

tczielke wrote:

When is 'out of date'?

Read How long do the queue manager repositories retain information?: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q017340_.htm

"Out of date" would be the scenario that the original link was talking about where a PR has been recovered from some point in the past, and it is no longer functioning properly because its sequence numbers are too "out of date."
_________________
Working with MQ since 2010.

hughson

Posted: Thu Feb 21, 2019 10:31 am Post subject:

Padawan

Joined: 09 May 2013
Posts: 1964
Location: Bay of Plenty, New Zealand

PeterPotkay wrote:

tczielke wrote:

The way a PR is kicked out if a cluster (ACTION (FORCEREMOVE)) is by bumping it's sequence number on the FR so that all its updates are ignored. If the FR told it, then this wouldn't work.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

bruce2359

Posted: Thu Feb 21, 2019 11:48 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Tim: How do â€˜out of dateâ€™ and â€˜too out of dateâ€™ differ?

How would an FR determine out of date-ness? Would one stale PR def force all subscribed PR defs for this qmgr to be updated?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

tczielke

Posted: Thu Feb 21, 2019 12:08 pm Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

bruce2359 wrote:

That is for me to know and IBM to find out.

Seriously, I don't understand the cluster internals well enough to answer a question like that. However, at a high level, I don't see why the PRs and FRs can't have some way of keeping themselves in synch where the "out of date" PR scenario can't be detected and then corrected automatically by the internal cluster code. Kind of like how cluster channels can automatically clear channel sequence issues without the administrator needing to get involved.
_________________
Working with MQ since 2010.

tczielke

Posted: Thu Feb 21, 2019 1:21 pm Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 941
Location: Illinois, USA

One other reminder for MQ administrators when working with cluster commands.

https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.con.doc/q021225_.htm

Quote:

To have confidence that these commands have finished, check that the expected objects exist on the remote queue managers.

For the backup/restore scenario in that original link, if the MQ administrator was validating that his/her cluster changes on the restored PR was being reflected in the cluster (as they should be per the guidance), they would know right away that their changes are not taking and something is wrong with the restored PR from a clustering stand point.
_________________
Working with MQ since 2010.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » Clustering » MQ Clusters and internal sequence numbers

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP