ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » OAM data not maintained across a MS Cluster

Post new topic  Reply to topic
 OAM data not maintained across a MS Cluster « View previous topic :: View next topic » 
Author Message
PeterPotkay
PostPosted: Mon Feb 23, 2004 1:19 pm    Post subject: OAM data not maintained across a MS Cluster Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

MQSeries 5.3 CSD04
Windows 2000

Server1 and Server2 are in a Hardware Cluster maintained by Microsoft Cluster Service. QMA lives primarily on Server1. In the event of a failure of Server1, MSCS moves the Queue Manager resource over to Server2.

When QMA is running on its primary server, Server1, (99.9% of the time), we use the setmqaut command to set authorities on the QM and queues, and it works as expected.

However, when QMA fails over to Server2, none of the OAM settings are recognized. I thought all OAM data is kept in the SYSTEM.AUTH.DATA.QUEUE as persistent messages with unlimited Expiry. So why does QMA on Server2 not recognize the settings that were made to it on Server1? The reverse is also true.

The queue depth of SYSTEM.AUTH.DATA.QUEUE is the same on both nodes as I move the QM back and forth. If I create a new object, I see the depth go up by one in SYSTEM.AUTH.DATA.QUEUE, and that depth is correct when I move the QM to the other node. So at the very least, the new entry in the queue is being maintained.

But if I do a dspmqaut or dmpmqaut, it does not show the settings that were set when the QM was on the other node. Its almost if the OAM is getting its info from some place other than the queue, and that "place" is not something under the control of Microsoft Cluster Services.

This is a big problem for us, since the instant we fail over the QM, we are knocked out of the water with 2035 errors all over the place.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
vennela
PostPosted: Mon Feb 23, 2004 1:30 pm    Post subject: Reply with quote

Jedi Knight

Joined: 11 Aug 2002
Posts: 4055
Location: Hyderabad, India

Have you tried issuing refresh security under runmqsc after the failover when the QMGR starts on the other box. If not try that and see if it helps.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
PeterPotkay
PostPosted: Mon Feb 23, 2004 1:49 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

I tried that, but it didn't do anything.



Not that that would be a pratical solution. The design for HA was to have an environment that would failover by itself, and within several minutes at the worst. Having to rely on someone to get paged, log on, and issue a command wouldn't fly with management or the customers.

This is a weird one. Where else could that info be held?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
vennela
PostPosted: Mon Feb 23, 2004 2:12 pm    Post subject: Reply with quote

Jedi Knight

Joined: 11 Aug 2002
Posts: 4055
Location: Hyderabad, India

Have you seen this in CSD05 readme

Quote:

IC35976 - MSCS on Windows 2000 or Windows XP: In a cluster
environment one of many queue managers went down and others
receive multiple FDCs including MC053000, MC011056,
MC008047, MC008050 and MC064000. After this failure, there
were FDCs with probe MC039003 reporting error 1340
(ERROR_BAD_INHERITENCE_ACL) from the
CMQMRegSecurity::ChangeKeySecurity method.


May or may not be related to your problem though.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mqonnet
PostPosted: Mon Feb 23, 2004 2:13 pm    Post subject: Reply with quote

Grand Master

Joined: 18 Feb 2002
Posts: 1114
Location: Boston, Ma, Usa.

Peter, just a thought.

I am not sure how exactly MSCS works, but from your description it looks to me that you have a continuously running queue manager that is more or less transparent to the end user about the failure of Server1(primary node). The failover occurs and Server2(backup node) takes over the resources of Server1. I would think that all the MQ exes keep running but they are allocated memory in Server2. If this is how it all works, then the only reason that i can think of why your OAM changes are not visible on Server2 is because any changes to OAM are visible only upon QM restart. Just because you moved the SYSTEM.AUTH.DATA.QUEUE queue and its messages wouldnt make MQ realize that there are new authorizations.

Hope this helps.

Cheers
Kumar
Back to top
View user's profile Send private message Send e-mail Visit poster's website
PeterPotkay
PostPosted: Mon Feb 23, 2004 2:22 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

You may be correct vennela. I found this in my System error log:

Quote:

AMQ6119: An internal WebSphere MQ error has occurred (Error 87 merging ACLs on
'SOFTWARE\IBM\MQSeries\CurrentVersion\Configuration\Services\HIGMQILA')


I want to see what IBM says, but I think we are going to have to apply CSD05 of CSD06.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Feb 23, 2004 2:23 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Kumar, on Server2, QMA actually gets started up from a stopped state, so I would assume it would read in all the OAM data each time.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JasonE
PostPosted: Tue Feb 24, 2004 9:12 am    Post subject: Reply with quote

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

How is your OAM security done - do you authorize domain userids or local groups. If local groups, are you a domainlet or not (ie do both machines share the same security database). If you setmqaut against local groups you need to reissue the setmqaut on the other node as well, because the local group will have a different SID, and hence not work. There is another workaround as well, but is this your problem?
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue Feb 24, 2004 10:11 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

I am running the commands against UserIds, not Groups.

Running the commands on Server2 is not an option. QMA is not active on Server2 unless we are in failover mode, and we cannot failover the environment everytime we want to run a setmqaut command on Server1 just so we can do the same on Server2.


Quote:

If local groups, are you a domainlet or not (ie do both machines share the same security database).

I dont think so. I am in class until Thursday, but when I get back, I will ask the Sys Admin this question if you think the answer makes a difference based on my reply to your suggestion.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JasonE
PostPosted: Tue Feb 24, 2004 3:13 pm    Post subject: Reply with quote

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

Assuming the userids are domain and not local then this should be fine - Its stored in the auth queue. Do an amqoamd before and after swapping nodes, are the same number of records listed.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue Mar 02, 2004 5:07 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

The problem was that the Sys Admin created the IDs as Local. Not that I specifically told him to make them domain.

I had him create some Domain IDs in my LAB environment, and, problem solved. The Domain ID is the same whether Server1 or Server2 look at it, and thus the OAM data works, whether it is Server1 or Server2. (Although I have noticed tha the first time I fail over the QM, the initial look up takes a little longer than usual).


Thanks Jason.


None of the chapters on Security in the manuals mention this, not even the Security manual. It IS mentioned in Chapter 13 of the System Admin Guide, the chapter on MSCS.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » OAM data not maintained across a MS Cluster
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.