ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexClusteringDR MQ Clustering question

Post new topicReply to topic
DR MQ Clustering question View previous topic :: View next topic
Author Message
PeterPotkay
PostPosted: Sat Oct 25, 2008 2:38 pm Post subject: DR MQ Clustering question Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7577

DR MQ clustering question - Cluster consists of 9 QMs, all UNIX except for one z/OS. The 2 FRs are on UNIX (lets call them FR1 & FR2), the z/OS QM (PR1) & all others are PRs. z/OS DR test involves bringing up the mainframe in another data center using data restores. The DNS name & port#s are the same for the mainframe when it comes up in the DR site, & the intention is for the DR mainframe to talk with all the distributed QMs that were not DR'ed. If it goes well, the PR1 QM comes up at the DR site & continues to work just fine in the cluster. At the end of the test the DR site mainframe is shutdown & the real mainframe is brought up. The original PR1 is back in the cluster & all is well again. This has worked numerous times in the past.

Let's say that the data restore does not work at the DR site, so we manually rebuild PR1 on the DR mainframe. It now has a new QMID in the cluster. (The real mainframe is powered down during the DR test)

Q#1 Since the new DR PR1 is still using the same channel names, port # & DNS name, message routing should continue just fine, despite the duplicate QMIDs, correct? The only symptoms at this point would be messages in the error logs complaining about duplicate QMIDs?

Q#2 Knowing that the original PR1 will come back up once the DR mainframe is powered down & the real mainframe is powered back up, what will be the impact on message routing at that point, when both the new QMID & the original QMID are in the cluster. Now that the original QM & QMID is active again, will there be a message routing problem? Will the cluster not route messages to the original PR1 because the FRs see a "newer" QMID (the DR one we rebuilt that is now powered down)? It is my opinion that since the channel names, port numbers & DNS names are the same whether we have the DR mainframe powered up or the real mainframe powered up, the distributed QMs will not care. Message routing is not going to care if there is an older or newer QMID present, correct?

I understand that we can issue the RESET CLUSTER command to eject a specific QMID. Once we come back to our original configuration we will issue RESET CLUSTER to eject the newer QMID for that DR PR1. This post pertains to what is going to happen before we do that. Will message routing work during the DR test & after the DR test if there are 2 QMIDs for the same z/OS PR1 queue manager? And MQ does not care about which QMID is older or newer, correct?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Sat Oct 25, 2008 8:05 pm Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20130
Location: LI,NY

Peter, I'd be inclined to say correct, but nothing beats the experience of testing it out. So my advice would be to test it out first in a non production environment.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
mqjeff
PostPosted: Sat Oct 25, 2008 10:07 pm Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

fjb_saper wrote:
Peter, I'd be inclined to say correct, but nothing beats the experience of testing it out. So my advice would be to test it out first in a non production environment.


I agree with everything F.J. said.

Except for the part about it being correct.

I'm inclined to say it's NOT correct.

But nothing beats the experience of testing it out...
Back to top
View user's profile Send private message
zhanghz
PostPosted: Sat Oct 25, 2008 10:23 pm Post subject: Reply with quote

Disciple

Joined: 17 Jun 2008
Posts: 186

Never done such a thing in production nor test environment ( I hope I will never have to experience that), but I did do some test on my own regarding cluster on my laptop. The test consisted of creating 2 FR and 1 PR on localhost first, then deleting 1 FR and re-creating it. The purpose of the test was to try to understand more on "refresh cluster" command.

My test was a little bit messy, and didn't go well as planned (unexpected results obtained, wrong commands issued, wrong QMID removed, etc), so I was not able to document the results. But I did encounter the following points:

Assume QGMRs are FR1, FR2 (FR2 is the one that is created, deleted and then re-created) and PR1.
1) 2 QMIDs for FR2 appeared in repositories of FR1 and PR1.
2) 'dis clusqmgr' on FR1 and PR1 showed FR1 and PR1 were conencted to different QMIDs of FR2: FR1 was connected to old FR2, PR1 was connected to new FR2.
3) 'dis qc' on FR1 didn't show clusq defined on new FR2. Putting messages to clusq defined on new FR2 from FR1 gave 2085 (unknown obj name). If I noted down correctly, 'refresh cluster' on FR2 resolved the problem.
4) After I wrongly removed the new QMID for FR2 (should've removed old QMID for FR2), putting messages to clusq defined on FR2 from PR1 gave 2189 (cluster resolution error).
5) After I 'refresh cluster' on FR2 to re-introduce FR2 into the cluster, it was found that PR1 was connected to OLD FR2 instead of the new FR2.
6) after some other refresh cluster commands, it was noticed putting messages to clusq defined on FR2 from PR1 gave 2085 on the first try, then 2189 on subsequent tries.

(PS: Some actions were taken before/after/in-between the points above. Do not take it as a test report.)

This is messy. But the point is that I got errors (2085 and 2189) while trying to put messages to some clusq. But I don't quite understand why. I also don't understand how a QMGR choose which QMID it will try to connect to when there are more than 1 QMID for a QMGR in the cluster..
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Sun Oct 26, 2008 3:33 am Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7577

Well, two poobahs, two opinions. Hmm.

I started to test this on my laptop, and after deleting PR1 and recreating it, the FRs did show both the old and the new QMID, and clustering appeared to function normally. But I have no way of being able to bring back the original PR1 using just one machine. Time to test this in the lab where PR1 original is on machine A and PR1 new is on machine B, and I'll keep the FRs on my laptop.

I did open a PMR and ping Ian on this topic.

I'm leaning towards agreeing with FJ. I am not aware of any message routing based on QMID and messages always wanting to go to the newer QMID. And even if that was the case, if both QMIDs are on the same DNS name and port # and chanel name, why would it not work.

Then again, I do not know the internals of the MQ cluster code.

As SA_Fraser likes to say, Stay tuned.......more info coming.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Oct 26, 2008 8:58 am Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 8525
Location: US: west coast, almost. Otherwise, enroute.

Quote:
...if there are 2 QMIDs...

Did you really mean the 48 character QMID? Or QMNAME?

QMID is composed of the qmgr name and date/time stuff to make the QMID unique.

While it is possible (and messy) to create two qmgrs called BOB, the odds of creating two identical QMIDs for BOB are very, very small.

If memory serves, routing is based on qmgr name, not QMID.
_________________
There are two types of people in this world:
1) Those that can extrapolate from incomplete data
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Sun Oct 26, 2008 1:06 pm Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20130
Location: LI,NY

Peter,

The one thing you might have to do is reset the seqnum on the autogen cluster channels (via mqsc command) as there is no guarantee it doesn't try to hit the wrong channel...(QMID).

Hope this helps
F.J.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Sun Oct 26, 2008 1:53 pm Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7577

bruce2359 wrote:
Quote:
...if there are 2 QMIDs...

Did you really mean the 48 character QMID? Or QMNAME?

I meant QMID. Yes the 2 QMIDs for the same QM are in fact unique. The original QMID for PR1, plus the QMID for the PR1 that was just recreated on the DR mainframe.

So the FRs see PR1 twice, each with unique QMIDs, but both PR1s are on the same DNS name, port # and channel names.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Oct 26, 2008 3:36 pm Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 8525
Location: US: west coast, almost. Otherwise, enroute.

Quote:
Yes the 2 QMIDs for the same QM are in fact unique.


I suppose we're using the same language here. But, when you say that the QMIDs for the two QMs are unique, you mean that the QMIDs are different from each other, even though the QMNAMEs are identical?
_________________
There are two types of people in this world:
1) Those that can extrapolate from incomplete data
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Oct 27, 2008 8:08 am Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7577

Correct.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
zhanghz
PostPosted: Mon Oct 27, 2008 10:03 pm Post subject: Reply with quote

Disciple

Joined: 17 Jun 2008
Posts: 186

fjb_saper wrote:
Peter,

The one thing you might have to do is reset the seqnum on the autogen cluster channels (via mqsc command) as there is no guarantee it doesn't try to hit the wrong channel...(QMID).

Hope this helps
F.J.

Hi FJ, can you elaborate on how the qmgr select which QMID to connect to? If the channel is inactive and I do a start chl, there is no guarantee that the channel will try to connect to the correct QMID. I am really curious how MQ does that...

If only there is a START CHL() QMID()..
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Oct 28, 2008 6:10 am Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 8525
Location: US: west coast, almost. Otherwise, enroute.

QMID allows system admins to identify a specific qmgr if there are multiple qmgrs with the same qmgr name - to perform a RESET mqsc command, for example.
_________________
There are two types of people in this world:
1) Those that can extrapolate from incomplete data
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue Oct 28, 2008 6:20 am Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7577

Ian replied to this same question I posted on the list serve. Since that is a public forum I don't think he would have a problem with me quoting him here:

Quote:

Hi Peter,

A#1) Yes, but note...
- The duplicate QMIDs will probably own a queue each (i.e. duplicate queues). This may cause a 'double pull' effect, by which I mean that PR1 may pull in more messages as it will receive messages when the cluster workload balancing algorithm chooses either the old or the new QMID.

A#2) MQ does not care about which is older or newer and in theory this should work fine, but note...
- While the original PR1 is inactive, the cluster queue manager objects for the old QMID PR1 cannot be updated so they may have odd values for STATUS (i.e. a STATUS that does not match the status of the actual channel). This is not going to cause any problem that I can see, but could cause confusion to someone carrying out routing debugging. In fact, if the old QMID STATUS is not good (e.g. RETRYING), the "double pull" effect will probably be avoided.
- Presumably the DR test is short, but after 60ish days the old QMID cluster queue manager will be automatically deleted from the cluster cache.
- I think I may have seen a similar approach work in the past, but remember it's not exactly the top use-case in thought when designing a system that allows duplicate QMIDs, so I'd be wary relying on it for long periods without plenty of testing.


And I'm glad to hear that your primary DR test has worked numerous times in the past.

Cheers,
Ian

Ian Vanstone
IBM UK

_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Anant.v
PostPosted: Tue Mar 06, 2018 3:56 am Post subject: Similar issue. Reply with quote

Apprentice

Joined: 26 Nov 2014
Posts: 40
Location: Malaysia

Sorry to open up such an old topic.

We have a similar setup here. wherein we have same QMIDs for the DR version of the PROD QMGRS. The DR invoked by us is somewhat similar, wherein out of three qmgrs, two are stopped in PROD, and their DRs are invoked/started. Both of them are FRs for the cluster. The remaining one QMGR, which is a PR, and is a PROD QMGR connecting to the two DR FRs now works fine.. and then we failover the 2 FR QMGRs back to PROD. All works fine, but after few days, lets say 10-15 days, some of the queues in that cluster goes missing. This is always happening after such DR scenarios.

Can someone please suggest if we need to do a refresh cluster each time to kick them back to sync once moved back from DR to PROD ?


Thanks
Ananth
Back to top
View user's profile Send private message
mvic
PostPosted: Fri Mar 09, 2018 4:13 pm Post subject: Re: Similar issue. Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2078

Anant.v wrote:
Can someone please suggest if we need to do a refresh cluster each time to kick them back to sync once moved back from DR to PROD ?

Having 2 qmgrs claiming to be the same QMID at the same time leads to problems.

You should NEVER have two qmgrs running at the same time with the same QMID. Hopefully that is clear.

When you've restored from a backup, your restored qmgr has the same QMID, of course. This is not too bad, as long as the older version never runs at the same time.

However, there are sequence numbers in MQ's internal clustering flows, and in your freshly-restored qmgr these will be "old", and will need to be refreshed. If you do not, then the other members of the cluster will ignore your qmgr's new updates, because they will carry old sequence numbers.

REFRESH CLUSTER is the way (yes, the only way) to update a restored qmgr. It will make your qmgr renew its sequence numbers on all its owned objects (CLUSRCVR channels, queues, etc.) and re-issue these to the Full Repositories, which will then send the updates around all interested qmgrs.

If at the end of your DR exercise you want to switch on your production same-QMID qmgr again, then it will now have "old" sequence numbers, compared to the nice fresh ones you just generated. So you will have to say REFRESH CLUSTER again, on the one you want to remain up and running in production.

(By the way, you don't entirely escape from this issue if you use crtmqm to get a new qmgr with a new QMID. Assuming your new qmgr uses the same CLUSRCVR channel name to identify itself in your cluster network, then similar considerations apply - your new qmgr will override the older one. Only REFRESH CLUSTER on the qmgr you want to be "correct" will ever change that situation).

In general, refer to the Knowledge Center, where you will find various warnings against using REFRESH CLUSTER. There are downsides to using the command, it does take time and processing to fully enact it. But in the DR scenario you've described, you have no choice - you have to use it.
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Page 1 of 1

MQSeries.net Forum IndexClusteringDR MQ Clustering question
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.