Author |
Message
|
PeterPotkay |
Posted: Mon Feb 23, 2004 1:19 pm Post subject: OAM data not maintained across a MS Cluster |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
MQSeries 5.3 CSD04
Windows 2000
Server1 and Server2 are in a Hardware Cluster maintained by Microsoft Cluster Service. QMA lives primarily on Server1. In the event of a failure of Server1, MSCS moves the Queue Manager resource over to Server2.
When QMA is running on its primary server, Server1, (99.9% of the time), we use the setmqaut command to set authorities on the QM and queues, and it works as expected.
However, when QMA fails over to Server2, none of the OAM settings are recognized. I thought all OAM data is kept in the SYSTEM.AUTH.DATA.QUEUE as persistent messages with unlimited Expiry. So why does QMA on Server2 not recognize the settings that were made to it on Server1? The reverse is also true.
The queue depth of SYSTEM.AUTH.DATA.QUEUE is the same on both nodes as I move the QM back and forth. If I create a new object, I see the depth go up by one in SYSTEM.AUTH.DATA.QUEUE, and that depth is correct when I move the QM to the other node. So at the very least, the new entry in the queue is being maintained.
But if I do a dspmqaut or dmpmqaut, it does not show the settings that were set when the QM was on the other node. Its almost if the OAM is getting its info from some place other than the queue, and that "place" is not something under the control of Microsoft Cluster Services.
This is a big problem for us, since the instant we fail over the QM, we are knocked out of the water with 2035 errors all over the place. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
vennela |
Posted: Mon Feb 23, 2004 1:30 pm Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Have you tried issuing refresh security under runmqsc after the failover when the QMGR starts on the other box. If not try that and see if it helps. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Feb 23, 2004 1:49 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I tried that, but it didn't do anything.
Not that that would be a pratical solution. The design for HA was to have an environment that would failover by itself, and within several minutes at the worst. Having to rely on someone to get paged, log on, and issue a command wouldn't fly with management or the customers.
This is a weird one. Where else could that info be held? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
vennela |
Posted: Mon Feb 23, 2004 2:12 pm Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Have you seen this in CSD05 readme
Quote: |
IC35976 - MSCS on Windows 2000 or Windows XP: In a cluster
environment one of many queue managers went down and others
receive multiple FDCs including MC053000, MC011056,
MC008047, MC008050 and MC064000. After this failure, there
were FDCs with probe MC039003 reporting error 1340
(ERROR_BAD_INHERITENCE_ACL) from the
CMQMRegSecurity::ChangeKeySecurity method. |
May or may not be related to your problem though. |
|
Back to top |
|
 |
mqonnet |
Posted: Mon Feb 23, 2004 2:13 pm Post subject: |
|
|
 Grand Master
Joined: 18 Feb 2002 Posts: 1114 Location: Boston, Ma, Usa.
|
Peter, just a thought.
I am not sure how exactly MSCS works, but from your description it looks to me that you have a continuously running queue manager that is more or less transparent to the end user about the failure of Server1(primary node). The failover occurs and Server2(backup node) takes over the resources of Server1. I would think that all the MQ exes keep running but they are allocated memory in Server2. If this is how it all works, then the only reason that i can think of why your OAM changes are not visible on Server2 is because any changes to OAM are visible only upon QM restart. Just because you moved the SYSTEM.AUTH.DATA.QUEUE queue and its messages wouldnt make MQ realize that there are new authorizations.
Hope this helps.
Cheers
Kumar |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Feb 23, 2004 2:22 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
You may be correct vennela. I found this in my System error log:
Quote: |
AMQ6119: An internal WebSphere MQ error has occurred (Error 87 merging ACLs on
'SOFTWARE\IBM\MQSeries\CurrentVersion\Configuration\Services\HIGMQILA')
|
I want to see what IBM says, but I think we are going to have to apply CSD05 of CSD06. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Feb 23, 2004 2:23 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Kumar, on Server2, QMA actually gets started up from a stopped state, so I would assume it would read in all the OAM data each time. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
JasonE |
Posted: Tue Feb 24, 2004 9:12 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
How is your OAM security done - do you authorize domain userids or local groups. If local groups, are you a domainlet or not (ie do both machines share the same security database). If you setmqaut against local groups you need to reissue the setmqaut on the other node as well, because the local group will have a different SID, and hence not work. There is another workaround as well, but is this your problem? |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Feb 24, 2004 10:11 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I am running the commands against UserIds, not Groups.
Running the commands on Server2 is not an option. QMA is not active on Server2 unless we are in failover mode, and we cannot failover the environment everytime we want to run a setmqaut command on Server1 just so we can do the same on Server2.
Quote: |
If local groups, are you a domainlet or not (ie do both machines share the same security database).
|
I dont think so. I am in class until Thursday, but when I get back, I will ask the Sys Admin this question if you think the answer makes a difference based on my reply to your suggestion. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
JasonE |
Posted: Tue Feb 24, 2004 3:13 pm Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
Assuming the userids are domain and not local then this should be fine - Its stored in the auth queue. Do an amqoamd before and after swapping nodes, are the same number of records listed. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Mar 02, 2004 5:07 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
The problem was that the Sys Admin created the IDs as Local. Not that I specifically told him to make them domain.
I had him create some Domain IDs in my LAB environment, and, problem solved. The Domain ID is the same whether Server1 or Server2 look at it, and thus the OAM data works, whether it is Server1 or Server2. (Although I have noticed tha the first time I fail over the QM, the initial look up takes a little longer than usual).
Thanks Jason.
None of the chapters on Security in the manuals mention this, not even the Security manual. It IS mentioned in Chapter 13 of the System Admin Guide, the chapter on MSCS. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
|