Author |
Message
|
jmucchiello |
Posted: Mon Nov 01, 2010 1:03 pm Post subject: TCP/IP recurring problem |
|
|
Newbie
Joined: 01 Nov 2010 Posts: 5
|
We have to install our MQ app in client locations and so we don't have complete control of the network. Frequently our app crashes (invalid memory access) when an idle MQ connection is used. MQ puts an event in the Windows event log that an attempt to send failed (10054) from with MQPUT. What I don't understand is why the application crashes inside the call to MQPUT. It seems to happen when there's a firewall terminating idle connections. So we turn on the heartbeat interval at installation sites that have this firewall rule. But what want to know is how we can stop the call to MQPUT from crashing. It does not return a reason code it causes an access violation and the program exits. This happens in Windows 2000 or 2003 Server with MQ 5.3 or 6.0.
All attempts to create a simple case of this fail to fail. Any ideas? |
|
Back to top |
|
 |
Vitor |
Posted: Mon Nov 01, 2010 3:29 pm Post subject: Re: TCP/IP recurring problem |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jmucchiello wrote: |
It does not return a reason code it causes an access violation and the program exits. |
It should return a 2009 or more likely a 2019. If you're getting access errors then this sounds like an issue with the error handling.
jmucchiello wrote: |
This happens in Windows 2000 or 2003 Server with MQ 5.3 or 6.0. |
You shouldn't be using v5.3 still, and you should be moving off of v6.
jmucchiello wrote: |
Any ideas? |
It's a seriously odd design to have an application open a connection then sit and wait long enough for the connection to go idle before issuing an MQPut. If the delay between puts is that long you should consider using MQPut1 in preference (which establishes it's own connection). _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jmucchiello |
Posted: Tue Nov 02, 2010 5:18 am Post subject: Re: TCP/IP recurring problem |
|
|
Newbie
Joined: 01 Nov 2010 Posts: 5
|
Vitor wrote: |
jmucchiello wrote: |
It does not return a reason code it causes an access violation and the program exits. |
It should return a 2009 or more likely a 2019. If you're getting access errors then this sounds like an issue with the error handling. |
This is the code. I forgot I had changed it from MQPUT to IBM's CPP library. Shouldn't make a big difference. When this error occurs, the call to our trace routine prints. Under normal circumstances this runs quietly with no reasoncodes other than 0.
Code: |
try {
ret = m_clsQ.put(m_clsQMessage, putmessageOptions) != 0;
reasonCode = m_clsQ.reasonCode();
} catch (...) {
ret = false;
reasonCode = -1;
PTRACE("Put message caused an exception...");
}
|
Quote: |
You shouldn't be using v5.3 still, and you should be moving off of v6. |
Tell that to my ultra-conservative banking clients.
Quote: |
It's a seriously odd design to have an application open a connection then sit and wait long enough for the connection to go idle before issuing an MQPut. If the delay between puts is that long you should consider using MQPut1 in preference (which establishes it's own connection). |
Again, this is because the design is modeled after the original SNA interface where you had to establish a connection and if it went down you had to inform the user. I do not have much leeway in restructuring the surrounding code. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Nov 02, 2010 5:24 am Post subject: Re: TCP/IP recurring problem |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jmucchiello wrote: |
This is the code. |
This code doesn't capture the actual reason code, but simply uses a -1. The actual code is more likely as I indicated.
jmucchiello wrote: |
Tell that to my ultra-conservative banking clients. |
Tell your ultra-conservative banking clients that v5.3 has been out of support for a while & v6 is going the same way. So if they get any problems with their systems and they can't do any money making activities they won't get any help from IBM at all. Unless they're wasting a lot of money paying for extended support contracts.
jmucchiello wrote: |
Again, this is because the design is modeled after the original SNA interface where you had to establish a connection and if it went down you had to inform the user. I do not have much leeway in restructuring the surrounding code. |
It's a shame you're using an SNA design over TCP/IP. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
exerk |
Posted: Tue Nov 02, 2010 6:08 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
As Winsock Error 10054 is Connection reset by peer as Vitor suggests, you're probably picking up the underlying error below the WMQ one. Also I echo my master's view of getting off back-level, and soon to be back-level, WMQ versions. To convince them you could try opening a general-request PMR and showing them the response  _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
jmucchiello |
Posted: Tue Nov 02, 2010 7:49 am Post subject: Re: TCP/IP recurring problem |
|
|
Newbie
Joined: 01 Nov 2010 Posts: 5
|
Vitor wrote: |
This code doesn't capture the actual reason code, but simply uses a -1. The actual code is more likely as I indicated. |
I once had a version that checked inside the exception handler and it reported ReasonCode as 0. But that's not important. Inside the exception handler I don't care about the reason code. There should not be an exception when making a call into the MQ library. I want the thing to not throw the exception. I only find out the TCP/IP error because MQ puts it into the event log.
Quote: |
Tell your ultra-conservative banking clients that v5.3 has been out of support for a while & v6 is going the same way. So if they get any problems with their systems and they can't do any money making activities they won't get any help from IBM at all. Unless they're wasting a lot of money paying for extended support contracts. |
Unfortunately, I'm not the guy who makes policy on what we support. They don't call IBM when there's an MQ issue. They call us and want to know why our stuff doesn't work.
Quote: |
It's a shame you're using an SNA design over TCP/IP. |
Tell me about it. I would rewrite it in a heartbeat if I could. Believe me. When the conversion came up I was all for throwing all the code involved and doing it fresh. The rule about old code is crufty because there have been a lot of bug fixes just did not apply. It all could have gone away. |
|
Back to top |
|
 |
jmucchiello |
Posted: Tue Nov 02, 2010 7:52 am Post subject: |
|
|
Newbie
Joined: 01 Nov 2010 Posts: 5
|
exerk wrote: |
As Winsock Error 10054 is Connection reset by peer as Vitor suggests, you're probably picking up the underlying error below the WMQ one. Also I echo my master's view of getting off back-level, and soon to be back-level, WMQ versions. To convince them you could try opening a general-request PMR and showing them the response  |
Is there a known date for end of life (or whatever IBM calls it) for MQ 6? A link to IBM's website would be good. I doubt my boss knows about that.
EDIT: Never mind. I found it: http://www-01.ibm.com/software/support/lifecycle/index_w.html#FN_PT |
|
Back to top |
|
 |
Vitor |
Posted: Tue Nov 02, 2010 7:59 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
|
Back to top |
|
 |
mqjeff |
Posted: Tue Nov 02, 2010 8:02 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
|
Back to top |
|
 |
Vitor |
Posted: Tue Nov 02, 2010 8:10 am Post subject: Re: TCP/IP recurring problem |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jmucchiello wrote: |
I want the thing to not throw the exception. I only find out the TCP/IP error because MQ puts it into the event log. |
So if it doesn't throw an exception when the network fails under it what's it supposed to do?
jmucchiello wrote: |
They don't call IBM when there's an MQ issue. They call us and want to know why our stuff doesn't work. |
And when your stuff doesn't work because an OS or network patch has been applied which is incompatible with v5.3 & it's throwing FDC files like mad what's the plan? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Nov 02, 2010 8:13 am Post subject: Re: TCP/IP recurring problem |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jmucchiello wrote: |
Inside the exception handler I don't care about the reason code. |
This is also a shame. If you cared you could interogate the code, determine it's a network issue potentially caused by a transient problem or firewall (the WMQ reason code would describe this) and then attempt to handle the problem and retry. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jmucchiello |
Posted: Tue Nov 02, 2010 8:18 am Post subject: Re: TCP/IP recurring problem |
|
|
Newbie
Joined: 01 Nov 2010 Posts: 5
|
Vitor wrote: |
jmucchiello wrote: |
I want the thing to not throw the exception. I only find out the TCP/IP error because MQ puts it into the event log. |
So if it doesn't throw an exception when the network fails under it what's it supposed to do? |
MQ is not throwing the exception. The OS is throwing an Access Violation Exception. After an access violation, I can't leave the process running because I don't know if there are dangling memory allocation, lost resources or just plain screwed up internal status flags that will cause further damage.
The library should not cause an exception just because a socket died. It should return cleanly with a meaningful error code so my code can take reasonable action.
Quote: |
And when your stuff doesn't work because an OS or network patch has been applied which is incompatible with v5.3 & it's throwing FDC files like mad what's the plan? |
Not my problem. I'm not support. I'm just the guy in development screaming the sky is falling. I have no desire to support outdated software. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Nov 02, 2010 8:23 am Post subject: Re: TCP/IP recurring problem |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jmucchiello wrote: |
The library should not cause an exception just because a socket died. It should return cleanly with a meaningful error code so my code can take reasonable action. |
The penny's just dropped my end.
It's an old WMQ library (if these people are as ultra-conservative as you say I bet it's an old version of v6). There's a Windows patch(es) it doesn't sit well with which is causing your access violation rather than the reason code I've been expecting.
The sky is not falling; it's already in chunks around you.
If you are on a current version of v6 you can raise a PMR & protest the access violation (which I agree shouldn't happen). If you're not you can still raise a PMR but the 1st thing you're likely to be told is "apply the latest maintenance".
If this happens on v5.3 accept that a chunk of sky has hit you on the head. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|