MQSeries.net :: View topic - 2035s for 150 seconds???

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » 2035s for 150 seconds???

2035s for 150 seconds???

« View previous topic :: View next topic »

Author

Message

PeterPotkay

Posted: Mon Aug 29, 2005 6:56 am Post subject: 2035s for 150 seconds???

Poobah

Joined: 15 May 2001
Posts: 7723

MQ 5.3.0.8
Windows 2000 SP4
Microsoft Stretch Cluster

QMA runs on Node1 primarily. No problems. We swing it over to Node2, and it runs there no problem. We reboot Node1, and swing QMA back to its primary server, Node1. For 150 seconds, applications get 2035s! Both my QM error logs and their application logs agree on this point. And then the errors go away on their own! What the?!?!?!?

Any ideas? No other errors in the QM logs besides the 2035s on the connect call. No FDCs. Nothing in the system level MQ error log. Nothing odd in the Windows Event Viewer. This is production.

Grasping at straws, but I wonder if we tried to move QMA back to Node1 to fast after the reboot, and maybe the server hadn't had it's morning coffee yet and was not fully functional? It was about 5 minutes after the reboot that the 2035s errors just stopped on there own.

QMB runs primarily on Node2 in this cluster pair. As we moved it back and forth during these reboots, it never had a problem with 2035s on either node. Only QMA for the first 150 seconds after it returned to Node1. Weird.
_________________
Peter Potkay
Keep Calm and MQ On

bower5932

Posted: Mon Aug 29, 2005 7:39 am Post subject:

Jedi Knight

Joined: 27 Aug 2001
Posts: 3023
Location: Dallas, TX, USA

I'd be suspicious of something in the network not being quite up and having an indirect effect on WMQ. You might be able to take a trace and see something in it that sheds some light.

wschutz

Posted: Mon Aug 29, 2005 7:42 am Post subject:

Jedi Knight

Joined: 02 Jun 2005
Posts: 3316
Location: IBM (retired)

Peter...are you able to do a quick dspmqaut during this 150 second interval? and is amqzfuma running?
_________________
-wayne

kingsley

Posted: Mon Aug 29, 2005 8:31 am Post subject:

Disciple

Joined: 30 Sep 2001
Posts: 175
Location: Hursley

It takes a while before the last process comes up. I'd say, your app did'nt wait till the status is online in Cluster admin.

PeterPotkay

Posted: Mon Aug 29, 2005 11:45 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Yes, the app tries to reconnect 5 times 1 second apart, 5 times 10 seconds apart, then once a minute after that. It does not wait for us to say MQ is up.

But note that the app connects succesfully within seconds of the QM saying it is started (message in AMQERRO1.LOG) when that QM was moved to Node2. And the other QM had no connection issues on either node. I have to say that if the QM wrote the "I am Started" message to the log, and it is online, apps should be able to connect. Immediatly, and not wait 2.5 minutes longer (that is an eternity to keep getting 2035s). I looked at the app's logs, and see the 2059's while the QM is coming up, and then see the 2035's for 2.5 minutes.

We did not do a dspmqaut while this was happening, as we only learned about it after the fact. I wonder what that would throw.

Our Windows SysAdmin said it is entirely possible that we moved the QM over to the server so close to the reboot that some other O/S services had not yet started, and were thus unable to serve MQ's request to validate the domain ID.

The next time we reboot that server, we are going to wait 5 minutes before swinging the QM back, to see if that makes a diff.

Sure would like to know what is really going on.
_________________
Peter Potkay
Keep Calm and MQ On

hopsala

Posted: Mon Aug 29, 2005 12:21 pm Post subject:

Guardian

Joined: 24 Sep 2004
Posts: 960

I'd go with wschutz offer, possibly the authorization service process (amqzfuma) hasn't come up yet (though it is silly that the QM allows MQCONNs before it goes up), or with your admin's suggestion... I think that the fact it happened in Node1 and not Node2 is strictly coincidental.

Another option worth checking - in the past, though possibly nowadays, clustering products inhibited access to "swung" (using peter's term

) resources by using security checks. That is, if you switched a cluster resource to comp2, then in comp1 the cluster would restrict access to it using normal security auths.
I don't think this happens in today's products, since it's not a very good way of handling cluster resources, but i'm not familiar with exactly how microsoft clusters work... It is possible that the microsoft cluster simply didn't have time to renew authorizations to cluster resource (mq, or possibly some MQ file)

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » 2035s for 150 seconds???

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP