Author |
Message
|
thindk00 |
Posted: Tue Dec 04, 2007 8:03 am Post subject: Joining queue managers to a cluster |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
Hi,
In a traditional server/requester configuration it is not possible to configure an application to send messages to the target application without taking some form of action on the target queue manager, whether that is resetting the channel sequence numbers or defining the channel in the first place.
In a clustered environment this is not the case as a queue manager can join the cluster quite easily if it knows the details of the cluster sender channel, the cluster name and then details of the queue itself. If this is known an application can join the cluster and start sending messages into the network.
In a clustering situation, apart from using SSL on the cluster sender channels, is there any other simple way of preventing anyone else from joining the cluster? The reason I ask is that we've had an incident recently whereby a duplicate queue manager was joined to our cluster recently and confused our WMQ network causing all sorts of issues with message delivery. Our Middleware network was unable to determine which queue manager was the real queue manager and therefore couldn't resolve who to deliver the message as there were two valid destinations now in the clustering configuration (the original target queue manager and the duplicate queue manager resided on different machines but shared the same queue manager name).
I'd like to prevent new queue managers from being joined to our machine using a simple mechanism, other than SSL, if possible. Look forward to any suggestions/comments.
TIA. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Dec 04, 2007 8:18 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
There's a topic in the InfoCenter/Queue Manager Clusters manual about how to do this using a cluster exit.
Of course, nobody should be able to create a rogue qmgr in a production environment, and nobody should be able to establish a random network connection in a production environment, because of course the production network is isolated from user desktops and production servers are tightly controlled in terms of access and priviledges.
So if someone did this, then you have larger organizational problems rather than a technical issue. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Dec 04, 2007 8:23 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jefflowrey wrote: |
Of course, nobody should be able to create a rogue qmgr in a production environment, and nobody should be able to establish a random network connection in a production environment, because of course the production network is isolated from user desktops and production servers are tightly controlled in terms of access and priviledges. |
This is of course true, but the poster didn't mention production and there are some, shall I say careless, people out there in charge of quite large test environments? Used by large numbers of users who get annoyed when it all goes a bit funny?
Now I accept unreservedly that change control should apply to all areas that are user facing and/or not developer sandboxes and such control should prevent exactly the kind of howler described in the original post.
But we live in an imperfect world.
My 2 cents - proper controls are easier, cheaper and far less scary than exits or SSL (which can cause their own problems).
Other opinions may be equally valid, etc, etc _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Dec 04, 2007 8:27 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
And you have the same problem outside of clustering contrary to yuour opening statement. If I know your QM's details (hostname, port #, RCVR channel, q name) I can create a SNDR channel on my QM with the same name as your RCVR channel, start it up and start sending messages. If I can get that far I can probably send a message to your command queue (unless you specifically prevented this on your RCVR channel) to define some new queues and channels. At that point its a simple matter of connecting to your QM directly with any MQ Admin tool and having all sorts of fun with your messages and queues.  _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
thindk00 |
Posted: Tue Dec 04, 2007 8:28 am Post subject: Exact scenario |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
OK, the exact scenario is that we had a production application team believe that they could have a DR model whereby they could bring up a replacement queue manager if their original one had gone down and begin to use the central Middleware without the need for intervention in the central Middleware (e.g. they didn't know we had to issue reset cluster commands etc to remove the old queue manager to allow the new one to be connected).
So, we want to avoid this happening and make sure other users in our production team do not attempt this as well as this introduces the issues explained earlier. Apart from communicating the problem with doing this, is there any sure fire way of doing this? The application team will have access to SSL certificates, chad exits, etc. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Dec 04, 2007 8:29 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
I agree, but.
Also, it's entirely possible to create a sender channel called "SYSTEM.DEF.RECEIVER" on a qmgr, and connect to a non-clustered qmgr. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Dec 04, 2007 8:31 am Post subject: Re: Exact scenario |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
thindk00 wrote: |
So, we want to avoid this happening and make sure other users in our production team do not attempt this as well as this introduces the issues explained earlier. Apart from communicating the problem with doing this, is there any sure fire way of doing this? The application team will have access to SSL certificates, chad exits, etc. |
Very very restrictive network firewall rules? Your live MQ servers wil only accept connections from a list of servers you provide, i.e. you other live MQ servers. Of course someone can still create a new rogue QM on the existing MQ server to get around this, but I defer to Vitor's comments at that point. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
thindk00 |
Posted: Tue Dec 04, 2007 8:33 am Post subject: |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
That doesn't confuse clustering configuration in the middleware though. The problem we had was that the Middleware couldn't deliver messages out to the real queue manager as it could find two cluster sender channels that matched the correlation id. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Dec 04, 2007 8:34 am Post subject: Re: Exact scenario |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
thindk00 wrote: |
OK, the exact scenario is that we had a production application team believe that they could have a DR model whereby they could bring up a replacement queue manager if their original one had gone down and begin to use the central Middleware without the need for intervention in the central Middleware (e.g. they didn't know we had to issue reset cluster commands etc to remove the old queue manager to allow the new one to be connected) |
Ok, well there's 2 points here. Firstly is Jeff's point now comes clanging down like an anvil - no change should be possible in a production environment without proper controls, impact analysis, risk evaluation and all the other good things. Secondly, how come an application team believes and worse, is then allowed to make changes to part of the software stack without the input or (it seems) knowledge of the administrators? Do they make changes to the database tables without telling the DBAs?
You do have larger organizational issues. Especially if, as you say, they'd have access to any SSL certs or other items they need. Which should also be secured away. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Dec 04, 2007 8:37 am Post subject: Re: Exact scenario |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
PeterPotkay wrote: |
thindk00 wrote: |
So, we want to avoid this happening and make sure other users in our production team do not attempt this as well as this introduces the issues explained earlier. Apart from communicating the problem with doing this, is there any sure fire way of doing this? The application team will have access to SSL certificates, chad exits, etc. |
Very very restrictive network firewall rules? Your live MQ servers wil only accept connections from a list of servers you provide, i.e. you other live MQ servers. Of course someone can still create a new rogue QM on the existing MQ server to get around this, but I defer to Vitor's comments at that point. |
A DR machine might very well have the same source IP address, and presumably the same QM name, but a different QMID. It would require a cluster exit to validate QMIDs. I don't know how easy or hard that is.
The DR setup is wrong - it should either be done using real HA across geo site boundaries, or using the backup qmgr option in v6. (or both) _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Dec 04, 2007 8:39 am Post subject: Re: Exact scenario |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jefflowrey wrote: |
The DR setup is wrong - it should either be done using real HA across geo site boundaries, or using the backup qmgr option in v6. (or both) |
And certainly not by just banging a spare queue manager into the prod environment!
Without the knowledge of the MQ admins!
What's needed here is not channel exits or SSL, but a good old fashioned witch hunt where the guilty are found, punished and made an example of. At least beaten over the head with a copy of the change control regulations, or named and shamed with the user community.
That could be just me.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
thindk00 |
Posted: Tue Dec 04, 2007 8:51 am Post subject: |
|
|
Voyager
Joined: 16 May 2001 Posts: 75 Location: UK
|
Seems like there is no easy way of preventing this from happening if the application owners try to attempt to do this (they have SSL and chad exits available to them as they are already in posession of this for their production machine).
Some good stuff here and I agree with it. We do have some issues with some of our application owners requiring education on how to recover and design DR solutions. This is WIP. I will look into the V6 backup queue manager option, where can I find more details on this? |
|
Back to top |
|
 |
Vitor |
Posted: Tue Dec 04, 2007 8:55 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
thindk00 wrote: |
Seems like there is no easy way of preventing this from happening if the application owners try to attempt to do this (they have SSL and chad exits available to them as they are already in posession of this for their production machine). |
But my comments apply - they "own" the box but how can an application team own system software?
thindk00 wrote: |
I will look into the V6 backup queue manager option, where can I find more details on this? |
It's in the System Admin guide, "Backing up and restoring queue managers" _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Dec 04, 2007 9:14 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I suppose channel exits on all channels is another way to prevent this.
thindk00, DR is my primary focus right now at my current shop. How far apart are your 2 data centers? Can you use a hardware clustering solution that spans the 2 datacenters that relies on synchronous updates of both data stores at both of the data centers? a.k.a stretch clusters
If not, consider this. If you can't synchronously update the data at both sites, you introduce the possibility of some message loss due to the lag time inherent to async data replication, be it at the SAN level or using the log shipping technology of backup QMs. And you also introduce the possibility of duplicate messages (the MQGET call executed at your primary site but hasn't been replicated to the DR site yet).
So if your apps can handle the possibility of one or more message being lost in a true disaster, why bother with async data replication or backup QMs? Especially when that adds the possibility of duplicate messages as well?
Create a 2nd QM (with a unique but similiar name) on a 2nd DR server (with a unique but similiar name) and just add it to the cluster on Day 1, but leave all the queues unclustered. When DR strikes, cluster the queues and you're back in business. For apps that need to client connect to the QM, they use a VIP called ServerAServerB. Your network team sets it up so requests to ServerAServerB get forwarded to ServerA, and when disaster strikes they send requests to ServerB. Apps don't need to change connection info. (Your MQ intra-QM channels use the actual server hostnames in their defs, they dont use the VIP)
Yes, you will lose messages on your queues on the QMs in the data center that blows up. But remember, unless you are using stretch clustering with synchronous data replication, you are going to lose messages no matter what else you do. Add in the possibility of duplicate messages with async replication and you are better off accepting that fact and coming up with a clean QM (no messages) in DR.
MQ is not a DB, its a message transport. If an app has messages in a queue that it cannot lose no matter what even in a data center level disaster and your alternate datacenter is to far away for stretch clusters, the data in those MQ messages need to be in a DB as well, and the app needs to be coded to handle MQ message loss. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Dec 04, 2007 3:59 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
I'm a little bit confused...
How did you get to have 2 qmgrs with the same name up and running at the same time in the cluster???
If your DR was active/active you should have been notified in advance and foreseen the problem...
If the DR exercise was to simulate DR then it was done poorly as both sites were up at the same time...
 _________________ MQ & Broker admin |
|
Back to top |
|
 |
|