Author |
Message
|
bigdavem |
Posted: Tue Nov 18, 2003 10:13 pm Post subject: TCP/IP return code 73 |
|
|
 Acolyte
Joined: 16 Sep 2001 Posts: 69 Location: Sydney, Australia
|
We have about 500 clients connecting via SVRCONN channels to a v5.2 CSD 06 qmgr running on AIX 5.1. We've been receiving this message in the MQ error logs 300-400 times per day for the last few weeks. The error occurs every couple of minutes, and it's a different IP address each time. Users are reporting no impact on their application, so it seems to be something that MQ is recovering from.
11/12/03 16:01:54
AMQ9208: Error on receive from host 10.49.19.51.
EXPLANATION:
An error occurred receiving data from 10.49.19.51 over TCP/IP. This may be due
to a communications failure.
ACTION:
The return code from the TCP/IP (read) call was 73 (X'49'). Record these values
and tell the systems administrator.
Does anyone know what might be causing this? |
|
Back to top |
|
 |
JasonE |
Posted: Wed Nov 19, 2003 2:24 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
73 == ECONNRESET, I think, so at a guess, does the client app start, run and end but never disconnect (MQDISC)? |
|
Back to top |
|
 |
bigdavem |
Posted: Wed Nov 19, 2003 2:53 am Post subject: |
|
|
 Acolyte
Joined: 16 Sep 2001 Posts: 69 Location: Sydney, Australia
|
Yeah, I thought of that, but I couldn't reproduce it myself no matter how many bizarre ways I tried to shut down the app. Also, it happens constantly throughout the day, once every 2 or 3 minutes. If it was a problem with the disconnect then I would expect there to be heaps at about 5pm as people are shutting down to go home, but that isn't the case.... |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Nov 19, 2003 10:02 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Did you try just pulling out the network cable from the back of the Client PC?
This would simulate a network outage, or blip, which is maybe what you are experiencing? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bigdavem |
Posted: Wed Nov 19, 2003 4:02 pm Post subject: |
|
|
 Acolyte
Joined: 16 Sep 2001 Posts: 69 Location: Sydney, Australia
|
I hadn't tried that, but I did this morning. Still nothing in the logs for my IP address..... |
|
Back to top |
|
 |
bower5932 |
Posted: Thu Nov 20, 2003 5:59 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Aug 2001 Posts: 3023 Location: Dallas, TX, USA
|
I once asked one of the developers about the connection reset error. Here is what I got back:
Quote: |
10054 is 'connection reset by peer'. In other words, the other end of the socket just closed the connection without actually going through the channel close protocol. This would happen, for example, if a client program just ended without issuing an MQDISC.
|
I'm guessing that you are seeing this error on your server? I'd suspect that either the clients aren't issuing the MQDISC or your users are getting impatient and ending the programs without allowing them to issue the MQDISC. |
|
Back to top |
|
 |
bigdavem |
Posted: Thu Nov 20, 2003 1:27 pm Post subject: |
|
|
 Acolyte
Joined: 16 Sep 2001 Posts: 69 Location: Sydney, Australia
|
That's what I thought as well, but I've reviewed the code and it always calls MQDISC when shutting down, and I've tried to reproduce the errror by trying every way imaginable to shut it down - no luck. I guess there must be some circumstance where MQDISC is bypassed, but I can't find it.
I guess I'll pass it back to the programmers and tell them their code has a bug and see what they can find.... |
|
Back to top |
|
 |
mrlinux |
Posted: Thu Nov 20, 2003 5:09 pm Post subject: |
|
|
 Grand Master
Joined: 14 Feb 2002 Posts: 1261 Location: Detroit,MI USA
|
Well I have seen this error caused more by network issues than anything, maybe someone from your network group could throw a sniffer between the 2 and check it out. _________________ Jeff
IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries |
|
Back to top |
|
 |
JasonE |
Posted: Fri Nov 21, 2003 1:57 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
Dont forget the error is only picked up when someone tries to do something on the socket. For the receiver (server) side, it sits on a receive call 99% of the time, and as such would only be woken up if keepalive is enabled (default 2hrs on most o/s's). If the client side does an MQGET, then the get is processed on the server. If the connection died at that point in time, the error would be detected when the server tries to send back the reply. Have you tried terminating abnormally in this case? |
|
Back to top |
|
 |
bigdavem |
Posted: Sun Nov 23, 2003 3:49 pm Post subject: |
|
|
 Acolyte
Joined: 16 Sep 2001 Posts: 69 Location: Sydney, Australia
|
OK, some progress. I tried that, got the app into an MQGET with wait state then pulled the network cable from my laptop. Sure enough, the error message in question appeared in the log.
So I guess this leaves us with the likelihood that we have network errors? The network guys say they've investigated and can't find anything, but I suppose I need to go back to them with this.
Thanks for your help! |
|
Back to top |
|
 |
|