ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » MQ Clusters and internal sequence numbers

Post new topic  Reply to topic Goto page 1, 2  Next
 MQ Clusters and internal sequence numbers « View previous topic :: View next topic » 
Author Message
PeterPotkay
PostPosted: Mon Feb 18, 2019 5:28 pm    Post subject: MQ Clusters and internal sequence numbers Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7717

This is a really good article if you are interested in how clusters work under the covers. It explains the internal sequence numbers the cluster uses.

https://developer.ibm.com/messaging/2019/02/15/avoiding-mq-cluster-problems-after-disaster-recovery-testing/#comment-43209
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
tczielke
PostPosted: Tue Feb 19, 2019 9:10 am    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

Thank you for pointing out that article. That is very helpful. A few thoughts come to my mind when reading that article.

1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?

2) The real life situations documented here show the need for a REFRESH CLUSTER command that is not disruptive (or at least tries to not be disruptive). For example, a REFRESH CLUSTER that will enqueue for an exclusive opening to refresh the cluster and block any subsequent MQOPEN calls until the REFRESH CLUSTER completes. I know this is an oversimplification, but the main gist is I would rather have my MQOPEN calls delay during a REFRESH CLUSTER and wait for the command to complete, than return immediately but with an error to the application.

3) If we had #2, would there then be an option for #1 for the FR to just tell the PR to do a REFRESH CLUSTER because it is sending out of date sequence numbers and if the PR then determines that this is due to out of date information it has for the cluster.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
mvic
PostPosted: Tue Feb 19, 2019 9:34 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

tczielke wrote:
1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?

I think there is an analogy with TCP packet sequence numbers: https://stackoverflow.com/questions/141128/does-tcp-ip-prevent-packet-replays
However TCP/IP stacks do not persist their sequence numbers (I would assume!) and even if they do, are not restored from backup, so would not be susceptible to using stale info from the backup.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue Feb 19, 2019 1:51 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7717

tczielke wrote:

1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?


Hmm, makes me think that this is either an oversight (heck, record this situation in both the FR's and PR's log with a unique AMQ**** code) or out of order internal updates happen often enough and are usually harmless enough that its not worth recording as an error.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue Feb 19, 2019 1:57 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7717

I go back to this section of the article, that explains how REFRESH CLUSTER can help

Quote:

Why does REFRESH CLUSTER help resolve this problem?
There’s a quite a lot of processing within REFRESH CLUSTER, but the most important part for our consideration is the update to the sequence numbers.

REFRESH CLUSTER updates all the sequence numbers on the clustered queues, topics and CLUSRCVR channels owned by the queue manager where you run the command.

The sequence numbers will be set to the current Unix Epoch time, as given by the local system clock. The Unix Epoch time is the number of seconds since 00:00:00 on 1 January 1970 UTC. So, this number increases once per second, forever.

When the queues and CLUSRCVR channels have their new sequence number, they are re-published immediately to the Full Repositories.

The new sequence number (now set to the current Unix Epoch time) will be higher than the sequence number the other queue managers have stored, so they will not discard these updates.

So, by running REFRESH CLUSTER on the restored queue manager, you will have allowed its updates to be accepted again.


So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time, thus avoiding the whole issue. If the PR is always publishing with the largest possible sequence number, wouldn't it always ensure its latest update is treated as such by everyone else in the cluster? Even more

_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
tczielke
PostPosted: Tue Feb 19, 2019 2:03 pm    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

I don't understand all the clustering internals, but it seems for the clustering issue where a PR gets out of date (i.e. recovered from a back up), there is an opportunity here for the MQ cluster code to self-correct the issue with out the administrator having to get involved. That would at least be the ideal way to do it.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Tue Feb 19, 2019 8:54 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

PeterPotkay wrote:
I go back to this section of the article, that explains how REFRESH CLUSTER can help

...............

So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time, thus avoiding the whole issue. If the PR is always publishing with the largest possible sequence number, wouldn't it always ensure its latest update is treated as such by everyone else in the cluster? Even more


And I speculate here that it could be to avoid accidentally polluting the cluster with information that will be transient. (Short time the DR will operate before Prod is used again).
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
mvic
PostPosted: Wed Feb 20, 2019 1:57 pm    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

PeterPotkay wrote:
So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time

Probably this would work out fine, but you'd be restricted to making one ALTER on any one object a maximum of once per second because the epoch time only increases by one per second.
Back to top
View user's profile Send private message
tczielke
PostPosted: Wed Feb 20, 2019 2:20 pm    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

mvic wrote:
PeterPotkay wrote:
So why doesn't MQ simply use the "current Unix Epoch time" method for generating sequence number 100% of the time

Probably this would work out fine, but you'd be restricted to making one ALTER on any one object a maximum of once per second because the epoch time only increases by one per second.


On Unix you do have data structures to track time more granular than seconds, but it does look like IBM MQ is doing it at the second level.

Here is a sequence number for one of my 9.1 cluster queues:

QLOCAL(TCZ.CLUSQ.103018) Seq(1543865176)

If you do the math that comes out to around 48 years, which would be tracking it at seconds since Unix thinks time began on January 1, 1970 UTC. A little bit or revisionist history if you ask me.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Feb 21, 2019 8:19 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

tczielke wrote:
I don't understand all the clustering internals, but it seems for the clustering issue where a PR gets out of date (i.e. recovered from a back up), there is an opportunity here for the MQ cluster code to self-correct the issue with out the administrator having to get involved. That would at least be the ideal way to do it.

When is 'out of date'?

Read How long do the queue manager repositories retain information?: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q017340_.htm
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
tczielke
PostPosted: Thu Feb 21, 2019 9:01 am    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

bruce2359 wrote:
tczielke wrote:
I don't understand all the clustering internals, but it seems for the clustering issue where a PR gets out of date (i.e. recovered from a back up), there is an opportunity here for the MQ cluster code to self-correct the issue with out the administrator having to get involved. That would at least be the ideal way to do it.

When is 'out of date'?

Read How long do the queue manager repositories retain information?: https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.con.doc/q017340_.htm


"Out of date" would be the scenario that the original link was talking about where a PR has been recovered from some point in the past, and it is no longer functioning properly because its sequence numbers are too "out of date."
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
hughson
PostPosted: Thu Feb 21, 2019 10:31 am    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

PeterPotkay wrote:
tczielke wrote:

1) When the FR decides to discard updates from the outdated PR because the sequence numbers are too low, why doesn't the FR also report this back to the PR so that it can know and report in the errors logs that it is no longer functioning properly in the cluster?


Hmm, makes me think that this is either an oversight (heck, record this situation in both the FR's and PR's log with a unique AMQ**** code) or out of order internal updates happen often enough and are usually harmless enough that its not worth recording as an error.

The way a PR is kicked out if a cluster (ACTION (FORCEREMOVE)) is by bumping it's sequence number on the FR so that all its updates are ignored. If the FR told it, then this wouldn't work.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
bruce2359
PostPosted: Thu Feb 21, 2019 11:48 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

Tim: How do ‘out of date’ and ‘too out of date’ differ?

How would an FR determine out of date-ness? Would one stale PR def force all subscribed PR defs for this qmgr to be updated?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
tczielke
PostPosted: Thu Feb 21, 2019 12:08 pm    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

bruce2359 wrote:
Tim: How do ‘out of date’ and ‘too out of date’ differ?

How would an FR determine out of date-ness? Would one stale PR def force all subscribed PR defs for this qmgr to be updated?


That is for me to know and IBM to find out.

Seriously, I don't understand the cluster internals well enough to answer a question like that. However, at a high level, I don't see why the PRs and FRs can't have some way of keeping themselves in synch where the "out of date" PR scenario can't be detected and then corrected automatically by the internal cluster code. Kind of like how cluster channels can automatically clear channel sequence issues without the administrator needing to get involved.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
tczielke
PostPosted: Thu Feb 21, 2019 1:21 pm    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

One other reminder for MQ administrators when working with cluster commands.

https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.con.doc/q021225_.htm

Quote:
To have confidence that these commands have finished, check that the expected objects exist on the remote queue managers.


For the backup/restore scenario in that original link, if the MQ administrator was validating that his/her cluster changes on the restored PR was being reflected in the cluster (as they should be per the guidance), they would know right away that their changes are not taking and something is wrong with the restored PR from a clustering stand point.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » Clustering » MQ Clusters and internal sequence numbers
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.