Author |
Message
|
SAFraser |
Posted: Sun Sep 19, 2010 7:40 pm Post subject: MQ7- AMQ9271- Channel timeout |
|
|
 Shaman
Joined: 22 Oct 2003 Posts: 742 Location: Austin, Texas, USA
|
(Won't you all be glad when we finally get migrated to our new data center? Aren't you tired of living the drama with me?)
Old data center: MQ 6, Solaris 9. Really old application that does not properly disconnect from MQ.
Although I've tried several times to get the development team to fix their app, I've failed. For years, our qmgr logs have been filled with AMQ9208 and AMQ9209 errors from the application machines.
New data center: MQ7, Solaris 10. Same really old application that does not properly disconnect from MQ.
We still see the AMQ9208 and AMQ9209 errors. But now they are sometimes followed by AMQ9999 errors. This is not a new error in v7, but it is new to us.
THEN the AMQ9999 error is sometimes followed by AMQ9271 error (channel timeout- 65 second timeout exceeded). This is a new error code in v7.
These AMQ9271 errors are much less frequent than the other disconnect errors. Actually, we thought they had stopped after we applied 7.0.1.3 (which fixes some channel problems for which we had seen some FDCs).
But we've just gotten a few more of the AMQ9271. I can't find anything anywhere that talks about these. Anyone have any experiences to share?
(Yes, I suppose we will open a PMR, too. I'm just so weary from the last one. They are going to say, "send us a trace". We've tried to reproduce the error and we can't.) |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Sep 19, 2010 8:22 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Isn't this an informational message?
I would expect any channel timeout to be just an informational message.
Like: "your channel instance has timed out, and by the way I'm closing it. So if you get some funky error code trying to use it, it's going to be expected...."
 _________________ MQ & Broker admin |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Sep 20, 2010 1:16 am Post subject: Re: MQ7- AMQ9271- Channel timeout |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
SAFraser wrote: |
(Yes, I suppose we will open a PMR, too. I'm just so weary from the last one. They are going to say, "send us a trace". We've tried to reproduce the error and we can't.) |
Leave trace enabled until you see the error. Collect the trace at that time. THEN open the PMR. |
|
Back to top |
|
 |
Vitor |
Posted: Mon Sep 20, 2010 4:20 am Post subject: Re: MQ7- AMQ9271- Channel timeout |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
SAFraser wrote: |
(Won't you all be glad when we finally get migrated to our new data center? Aren't you tired of living the drama with me?) |
Of course not. Speaking personally I enjoy hearing about the suffering of others. It gives me a framework to gauge my own suffering.
SAFraser wrote: |
Although I've tried several times to get the development team to fix their app, I've failed. For years, our qmgr logs have been filled with AMQ9208 and AMQ9209 errors from the application machines. |
Is there an oppertunity here for some creative error reporting? Like telling the application team these errors are fatal in WMQv7 & if they don't fix it the entire estate will fail / explode / rain trout?
Just thinking outside the box again.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
SAFraser |
Posted: Mon Sep 20, 2010 4:46 am Post subject: |
|
|
 Shaman
Joined: 22 Oct 2003 Posts: 742 Location: Austin, Texas, USA
|
fjb_saper wrote: |
Isn't this an informational message?
I would expect any channel timeout to be just an informational message.
Like: "your channel instance has timed out, and by the way I'm closing it. So if you get some funky error code trying to use it, it's going to be expected...."
 |
Oh yes, I believe it is informational. But I am distressed about the information! Why are these channels timing out? Are these just more indicators of the lousy disconnect code? Or is the shiny brand new network misbehaving? Or is the code not entirely happy with the v7 Client? AND-- I failed to mention this earlier-- we have seen the AMQ9271 from "good" clients, too, client connections from MDBs running in WAS 6.1.
mqjeff wrote: |
Leave trace enabled until you see the error. Collect the trace at that time. THEN open the PMR. |
I just reread the manual about trace. I don't understand the distinction between enabling trace and collecting the trace. I did like the part about limiting the trace to certain amq processes. But I don't understand what you mean, "enable" vs "collect". Our team will read about this again this morning.
Vitor wrote: |
Is there an opportunity here for some creative error reporting? Like telling the application team these errors are fatal in WMQv7 & if they don't fix it the entire estate will fail / explode / rain trout?
|
You'd think so, wouldn't you? But I believe I've exhausted "the sky is falling". That's how we got them to upgrade from MA88 to v7 this year. (Yes, you read that correctly. MA88.) The last time I opened a service request to get the ugly disconnects fixed, I argued with the developer for a month because he said he couldn't see that it was broken. I finally gave up.
Thanks! Keep those ideas coming! |
|
Back to top |
|
 |
Vitor |
Posted: Mon Sep 20, 2010 4:52 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
SAFraser wrote: |
(Yes, you read that correctly. MA88.) |
Yikes. Of Biblical Proportions.
 _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Sep 20, 2010 5:05 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
SAFraser wrote: |
I just reread the manual about trace. I don't understand the distinction between enabling trace and collecting the trace. I did like the part about limiting the trace to certain amq processes. But I don't understand what you mean, "enable" vs "collect". Our team will read about this again this morning. |
I mean the differences between strmqtrc, dspmqtrc and endmqtrc.
Particularly on Unix, where the .TRC files are not in human readable formats, you run strmqtrc and recreate the issue. Then you run dspmqtrc to collect the trace in a human readable output. Then you endmqtrc to disable further tracing.
On Windows you just strmqtrc, recreate the issue, collect/copy the .TRC files of interest, and then endmqtrc. |
|
Back to top |
|
 |
SAFraser |
Posted: Mon Sep 20, 2010 7:07 am Post subject: |
|
|
 Shaman
Joined: 22 Oct 2003 Posts: 742 Location: Austin, Texas, USA
|
Ahh, mqjeff, that is how I understand it, too.
We have tried to recreate the issue. We haven't been successful so far. The error occurs (apparently randomly) across any one of our 25 development queue managers.
So I can't figure out a way to trace.... |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Sep 20, 2010 7:42 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
SAFraser wrote: |
Ahh, mqjeff, that is how I understand it, too.
We have tried to recreate the issue. We haven't been successful so far. The error occurs (apparently randomly) across any one of our 25 development queue managers.
So I can't figure out a way to trace.... |
run strmqtrc.
Wait.
when the error occurs, run dspmqtrc. |
|
Back to top |
|
 |
bruce2359 |
Posted: Mon Sep 20, 2010 7:43 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
Aren't you tired of living the drama with me?) |
I'm amused, more than tired. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
gbaddeley |
Posted: Mon Sep 20, 2010 3:39 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
mqjeff wrote: |
Particularly on Unix, where the .TRC files are not in human readable formats, you run strmqtrc and recreate the issue. Then you run dspmqtrc to collect the trace in a human readable output. Then you endmqtrc to disable further tracing. |
The usual sequence is to end the trace, and then use dspmqtrc to display the (now complete) trace files into human readable format. Here's a funky command to do that into one big text file:
cd /var/mqm/trace
ls *.TRC | xargs -i -t dspmqtrc {} >>mytrace.txt _________________ Glenn |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Sep 20, 2010 4:29 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
gbaddeley wrote: |
The usual sequence is to end the trace, and then use dspmqtrc to display the (now complete) trace files |
Yes, okay.
I spend most of my time with mqsichangetrace, mqsireadlog and mqsiformatlog. |
|
Back to top |
|
 |
SAFraser |
Posted: Tue Sep 21, 2010 9:46 am Post subject: |
|
|
 Shaman
Joined: 22 Oct 2003 Posts: 742 Location: Austin, Texas, USA
|
We understand about trace.
These errors are random and intermittent. Running a long term trace (over several days) even with the trace limited to specific processes, and watching for a svrconn failure, doesn't seem very practical to us.
We have been unable to reproduce the error.
So I'm thinking that this new message code, AMQ9271, has not been a big problem for anyone else yet? |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Sep 21, 2010 10:43 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
SAFraser wrote: |
Running a long term trace (over several days) even with the trace limited to specific processes, and watching for a svrconn failure, doesn't seem very practical to us. |
I didn't comment on practicality. Yes, it has performance impact. But if you want to get a solid view...
All of the AMQ messages you've mentioned are "remote" AMQ messages. Are you able to correlate messages in the MQ logs on the client end with messages on the server end?
Are you able to correlate timestamps of the AMQ9271 messages with events in the network stack?
Have you done things with KeepAlive and etc. ? Have you tweaked KAINT and HBINT on the CLNTCONN/SVRCONN in use? |
|
Back to top |
|
 |
gbaddeley |
Posted: Tue Sep 21, 2010 3:02 pm Post subject: Re: MQ7- AMQ9271- Channel timeout |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
SAFraser wrote: |
...We still see the AMQ9208 and AMQ9209 errors. But now they are sometimes followed by AMQ9999 errors. This is not a new error in v7, but it is new to us.
THEN the AMQ9999 error is sometimes followed by AMQ9271 error (channel timeout- 65 second timeout exceeded). This is a new error code in v7.
These AMQ9271 errors are much less frequent than the other disconnect errors. Actually, we thought they had stopped after we applied 7.0.1.3 (which fixes some channel problems for which we had seen some FDCs).
But we've just gotten a few more of the AMQ9271. I can't find anything anywhere that talks about these. Anyone have any experiences to share?
(Yes, I suppose we will open a PMR, too. I'm just so weary from the last one. They are going to say, "send us a trace". We've tried to reproduce the error and we can't.) |
http://www-01.ibm.com/support/docview.wss?uid=swg21403280 might be of help. It indicates a comms problem rather than an MQ problem. _________________ Glenn |
|
Back to top |
|
 |
|