Author |
Message
|
gpklos |
Posted: Tue Aug 19, 2014 5:46 am Post subject: Windows Client Trigger Monitor (MA7K) not Reconnecting. |
|
|
Centurion
Joined: 24 May 2002 Posts: 108
|
I am having an issue that I hope someone can help me with. We are running CLIENT TRIGGER Monitors (Support Pac MA7K) on several Servers which connect to one of our queue managers. Every month there are windows patches that are applied to our MQ Queue manager. This queue manager is set up in an MSCS cluster. So we apply patches to one machine in the cluster and failover MQ to the other node in the cluster. The failover takes about 15-20 seconds and only causes an outage for 15-20 seconds, approximately. Now naturally what happens is the Client Trigger Monitors all get an initial 2009 error. They then go into a Retry State (Service goes into a PAUSE state as designed) for the designated amount of time. Now here is where things go wrong. The queue manager comes up as it should in 15-20 seconds and the client trigger monitor is still in its wait state before retrying. Now after a minute the client trigger monitor tries to connect (I think), and gets a 2538 error. A 2538 error is “MQRC_HOST_NOT_AVAILABLE“. Now I know the host is available and so is the address (DNS entry) for that host, so I don’t know why it is getting the 2538. After it gets a 2538 error, the client trigger monitor ends, then requires a manual restart, which is a real pain.
Now I’ve tried increasing the ShortTmr from 60 seconds to 3 minutes just in case that wasn’t long enough, but that made no difference. I even set the client trigger monitor up on my own machine to connect to a queue manager on my own machine to simulate it. Same results. It just seems to never be able to reconnect by itself.
Now for production it is a 7.0.1.* client connecting to a 7.0.1.8 queue manager. For my own machine it is a 7.5 client connecting to a 7.5 queue manager.
Has anyone experienced this or have any idea how to resolve it?
Thank youl |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Aug 19, 2014 6:04 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
It sounds like the failover isn't being noticed, or picked up.
Out of curiosity, does MA7K reconnect if you manually stop and restart the windows service?
Also, please contact the maintainer of MA7K using the email address in the docs... There's some tracing that should be done and reviewed. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Aug 19, 2014 6:17 am Post subject: Re: Windows Client Trigger Monitor (MA7K) not Reconnecting. |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
gpklos wrote: |
This queue manager is set up in an MSCS cluster. So we apply patches to one machine in the cluster and failover MQ to the other node in the cluster. |
gpklos wrote: |
Now I know the host is available and so is the address (DNS entry) for that host, so I don’t know why it is getting the 2538. |
You should not be using the DNS name of any single host in the MSCS cluster. You should be using the virtual DNS name that floats between the nodes of the MSCS cluster, following the active node.
It sounds like the QM comes up on Node 2, while MA7K is stuck looking at Node1, when it should be looking at the DNS name that covers Node1And2. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
gpklos |
Posted: Tue Aug 19, 2014 6:52 am Post subject: |
|
|
Centurion
Joined: 24 May 2002 Posts: 108
|
Just some clarification. Sorry if I wasn't clear.
<<You should not be using the DNS name of any single host in the MSCS cluster. You should be using the virtual DNS name that floats between the nodes of the MSCS cluster, following the active node. >>
We are specifying the a DNS entry which points to the Cluster IP address, so no matter what node the queue manager is running on, the cluster IP address always points to it. I don't believe this is an issue because all of our application use MQ Clients and specify that same DNS and they have not trouble reconnecting. The connect immediately.
Let me clarify one other thing. I've also tried this on a stand alone queue manager where I just shut down the queue manager while the client trigger monitor is connected, which gives the 2009 error. Then I restart the queue manager and the client trigger monitor never reconnects. It gets the 2538 error. However the queue manager is available and other clients already have connected.
<<Out of curiosity, does MA7K reconnect if you manually stop and restart the windows service? >>
Yes it does, with no hesitation or problems.
I hesitated contacting the author(s) since it is technically not supported according to the README. I just figured someone else would have seen this, or I hoped.
Thanks,
Gary |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Aug 19, 2014 8:19 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
gpklos wrote: |
I hesitated contacting the author(s) since it is technically not supported according to the README. |
I'm sure he won't mind.
And the fact that you've recreated the issue with a standalone queue manager that's not under any kind of HA says that there's an issue under a supported configuration anyway. For whatever value of 'supported' one holds to Category 2 supportPacs. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Aug 19, 2014 9:09 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Also, what is the sequence of events logged to the Windows Event Viewer?
Do you see a message about "too many retry conditions"?
Or do you just see a message about the 2538 ? |
|
Back to top |
|
 |
gpklos |
Posted: Tue Aug 19, 2014 9:17 am Post subject: |
|
|
Centurion
Joined: 24 May 2002 Posts: 108
|
The support pac page says to use the comment to ask for help. I see the author is Jeff Lowrey, so maybe I will send him an email since Wayne is retired?? |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Aug 19, 2014 9:35 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
And for ease of reading, please do NOT write one long paragraph. Rather, split it into several shorter ones. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
gpklos |
Posted: Tue Aug 19, 2014 9:55 am Post subject: |
|
|
Centurion
Joined: 24 May 2002 Posts: 108
|
Jeff Lowrey replied to me already I think the version I'm using is one version back, so I am going to try the new one and let him know. I will post my findings.
Thanks,
Gary |
|
Back to top |
|
 |
gpklos |
Posted: Wed Aug 20, 2014 9:32 am Post subject: |
|
|
Centurion
Joined: 24 May 2002 Posts: 108
|
It turned out to be the version. The newer version is setup to handle the 2538 error, but the version I was using was not. Once I upgraded it worked as I needed it to.
Thanks,
Gary |
|
Back to top |
|
 |
|