Author |
Message
|
wilso132 |
Posted: Wed May 19, 2010 7:11 am Post subject: *Occasional* AMQ9202/AMQ9208 |
|
|
Newbie
Joined: 12 May 2010 Posts: 8
|
We have several C# applications running as web services and as Windows services that are experiencing this particular error across our entire environment, but only very occasionally. All of our servers are Windows 2003 and we're currently using the 7.0.0.0 client.
Here's a sample of the errors we get:
Code: |
----- amqxfdcp.c : 818 --------------------------------------------------------
5/18/2010 12:45:43 - Process(7364.10) User(SYSTEM) Program(OrderService.exe)
AMQ9202: Remote host 'appt1 (111.11.111.11) (1414)' not available, retry
later.
EXPLANATION:
The attempt to allocate a conversation using TCP/IP to host 'appt1 (111.11.111.11) (1414)' was not successful. However the error may be a
transitory one and it may be possible to successfully allocate a TCP/IP
conversation later.
ACTION:
Try the connection again later. If the failure persists, record the error
values and contact your systems administrator. The return code from TCP/IP is
10061 (X'274D'). The reason for the failure may be that this host cannot reach
the destination host. It may also be possible that the listening program at
host 'appt1 (111.11.111.11) (1414)' was not running. If this is the
case, perform the relevant operations to start the TCP/IP listening program,
and try again.
----- amqccita.c : 1288 -------------------------------------------------------
5/18/2010 17:08:45 - Process(7364.14) User(SYSTEM) Program(OrderService.exe)
AMQ9208: Error on receive from host firemq (22.222.22.222).
EXPLANATION:
An error occurred receiving data from firemq (22.222.22.222) over TCP/IP. This
may be due to a communications failure.
ACTION:
The return code from the TCP/IP (recv) call was 10054 (X'2746'). Record these
values and tell the systems administrator.
----- amqccita.c : 3336 ------------------------------------------------------- |
From all the reading I've done, it would make sense if this error was persistent... but it's not. We may run 500-1500 transactions through without issue, then see this cause 1 transaction to fail and make an entry in the logs. It seems to point to some type of intermittent network issue.... does that seem to be the case or does anyone have other ideas? If I can provide any more details please let me know; thanks for the help! |
|
Back to top |
|
 |
Vitor |
Posted: Wed May 19, 2010 7:19 am Post subject: Re: *Occasional* AMQ9202/AMQ9208 |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
wilso132 wrote: |
It seems to point to some type of intermittent network issue.... does that seem to be the case or does anyone have other ideas? |
It does seem to be the case and the advice given here:
wilso132 wrote: |
Code: |
If the failure persists, record the error
values and contact your systems administrator. |
|
Certainly the network is the place to start. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
wilso132 |
Posted: Wed May 19, 2010 7:32 am Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 8
|
I was already running Wireshark hoping to figure something out; thanks for confirming my suspicions. |
|
Back to top |
|
 |
Vitor |
Posted: Wed May 19, 2010 7:42 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
I'd focus on the 10061, which indicates a connection was refused (in a "hang on a minute I'm busy" way rather than a firewall "get lost") rather than the 10054, which simply indicates something bad happened (like a 10061) & the connection's been reset to recover from it.
FWIW if any of the Windows 2003 servers are VMWare that's not a smoking gun, but is a firearm of some kind IMHO. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
wilso132 |
Posted: Wed May 19, 2010 10:48 am Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 8
|
No occurrence of 10061 with Wireshark running, but we did have a 10054. It looks like the issue is that we send an MQ packet (MQBACK) that expects a reply (MQBACK_REPLY) that it immediately determines it didn't/won't receive. I'm having a slip put in on the MQ side soon, but I believe MQ is actively sending back a reset packet instead of the MQBACK_REPLY. Why? I'm not sure, but I do know that the RST reset reason was "TSH", which sounds like it might possibly be a configuration issue with the transmission segment header? |
|
Back to top |
|
 |
Vitor |
Posted: Wed May 19, 2010 11:01 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
wilso132 wrote: |
which sounds like it might possibly be a configuration issue with the transmission segment header? |
Sounds more like a PMR to me.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
wilso132 |
Posted: Wed May 19, 2010 11:07 am Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 8
|
Definition please?
Edit: Nevermind, figured it out. I'd be glad to hand this over to someone else to deal with.  |
|
Back to top |
|
 |
JosephGramig |
Posted: Wed May 19, 2010 12:08 pm Post subject: |
|
|
 Grand Master
Joined: 09 Feb 2006 Posts: 1244 Location: Gold Coast of Florida, USA
|
First, I would put 7.0.1.1 client on and then go from there. We were getting those when the client was on one network and the server was on another. When we moved the server to the same network, the problem went away. Something in between the client and server can cause that problem. |
|
Back to top |
|
 |
wilso132 |
Posted: Wed May 19, 2010 12:33 pm Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 8
|
JosephGramig wrote: |
First, I would put 7.0.1.1 client on and then go from there. |
Thanks for the advice; I plan on giving this a try as soon as we get permissions to put it on a UAT box. |
|
Back to top |
|
 |
mvic |
Posted: Wed May 19, 2010 1:04 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
Vitor wrote: |
I'd focus on the 10061 |
ECONNREFUSED tends to be: nothing listening on the port, no available sockets etc. (ie. something at the OS level) rather than the network.
But that's from rather limited experience, I admit.
Could even be a firewall rule, though this would be visible as a permanent problem rather than a temporary one. I wonder if wilso132 sees 10061 errors a lot, or only sometimes.
Running 7.0.0.0 is also an inadvisable thing. In fact, running n.0 of anything is inadvisable if stability is your goal.
(I see 7.0.1.2 is out now, meaning 7.0.0.0 is about 5 fix packs behind the best-so-far). |
|
Back to top |
|
 |
wilso132 |
Posted: Wed May 19, 2010 1:12 pm Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 8
|
mvic wrote: |
10061 errors a lot, or only sometimes.
Running 7.0.0.0 is also an inadvisable thing. In fact, running n.0 of anything is inadvisable if stability is your goal.
(I see 7.0.1.2 is out now, meaning 7.0.0.0 is about 5 fix packs behind the best-so-far). |
Only sometimes. It seems like it could truly be a networking issue as I just spoke with our guys on the other side of the transaction and they're getting a send 73 tcp/ip error. They're starting traces on the server side tonight, so hopefully that will shed more light on the error.
Trust me, the 7.0.0.0 wasn't my choice and I wish an upgrade across our environment was straight forward. If I can prove a newer version of the client fixes the issue, it'll still be a few weeks worth of paperwork/processing to get it deployed everywhere. |
|
Back to top |
|
 |
mvic |
Posted: Wed May 19, 2010 1:20 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
wilso132 wrote: |
they're getting a send 73 tcp/ip error. |
73 is ECONNRESET on AIX. Tends to suggest a connection being cancelled by a firewall, or maybe a client coming to an end via Ctrl+C. But ECONNRESET is a different error from ECONNREFUSED.. the former tends to suggest the connection ran but then was halted, whereas the latter is a failure to get going in the first place.
Quote: |
Trust me, the 7.0.0.0 wasn't my choice and I wish an upgrade across our environment was straight forward. If I can prove a newer version of the client fixes the issue, it'll still be a few weeks worth of paperwork/processing to get it deployed everywhere. |
OK, understood.
If anyone wants a quick look at what has been fixed since 7.0.0.0 this is the place to go: http://www.ibm.com/support/docview.wss?rs=171&uid=swg27014224 |
|
Back to top |
|
 |
|