|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
sdr channel retrying - went running after stop/start |
« View previous topic :: View next topic » |
Author |
Message
|
scmq2758 |
Posted: Fri Sep 24, 2004 12:04 am Post subject: sdr channel retrying - went running after stop/start |
|
|
Newbie
Joined: 23 Sep 2004 Posts: 3 Location: Manchester, UK
|
We have had a sender channel from an MQ 5.3 queue manager on NT back to a 5.2 queue manager on AIX go into retrying mode. We think this might have been due to a network glitch as it was very slow around the time this happened. Anyway, this started retrying around 3 weeks ago and has only just been noticed. We could ping the AIX box from the NT server ok and also do a TRACERT ok. The channel was using dotted IP address so it was not a DNS issue.
Against the advice I have read in the manuals, I stopped and then started the channel (as it was such a long time since the problem occurred and I couldn't believe that the underlying problem still existed) and the channel immediately went RUNNING. I am confused as this goes against what I believed about MQ ie that a channel will always reconnect after a failure through the use of the short and long retries and that you should never have to manually take action as I had to do.
Anyone seen this or can come up with an explanation of how this type of incident can occur where the channel doesn't reconnect itself?
Thanks, _________________ Steve
IBM Certified System Administrator Websphere MQ V5.3 |
|
Back to top |
|
 |
JasonE |
Posted: Fri Sep 24, 2004 12:30 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
Not really... What was the error messages describing why the retry was failing? (look in the qmgr amqerr* log files on both sides). |
|
Back to top |
|
 |
scmq2758 |
Posted: Fri Sep 24, 2004 1:31 am Post subject: |
|
|
Newbie
Joined: 23 Sep 2004 Posts: 3 Location: Manchester, UK
|
The receiving end on AIX has no mention of any errors for the receiver channel in the logs. The associated sender channel back to NT is working ok and continuing to send messages through to the NT box but nothing was coming back.
On the NT server there was a mention of a TCP/IP error code 10054 (which is a time-out or connection reset I believe) back about three weeks ago when the problem seems to have occurred for the sender channel but then there is nothing to show the retry attempts. Normally I would expect to see this retrying every 20 minutes and producing an error message in the log but I don't see anything after the 10054 error some weeks ago. I didn't look at the channel status to see the number of long retries left and to confirm that this was going down but the lack of messages might suggest that the retries were not actually happening even though the status was RETRYING? _________________ Steve
IBM Certified System Administrator Websphere MQ V5.3 |
|
Back to top |
|
 |
JasonE |
Posted: Fri Sep 24, 2004 3:17 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
Well, if you run out of short and long retries, it should go into stopped, so that probably didnt happen. As you say, you coul have been in a long retry but I would have expected that to be set to 3 weeks!
10054 means the connection was dropped from 'the other end' (or at least, the windows box just got told the socket was dropped, we cant see why).
I suspect its one of those you'll have to watch out for and if it happens again look at the full channel status output, and see what retry should be going on. |
|
Back to top |
|
 |
scmq2758 |
Posted: Mon Sep 27, 2004 3:25 am Post subject: |
|
|
Newbie
Joined: 23 Sep 2004 Posts: 3 Location: Manchester, UK
|
This sdr channel retrying problem has re-occurred today.
What has happened is that the Applications people were testing a change and set the Windows system date forward to 01/01/2005. Then, without stopping and restarting the queue manager they sent a message. The error log just shows the remote end disconnecting with no error code and the sdr channel going into retry but the critical thing is that is 'in doubt' so the short retries never decreases from 10. The channel was resolved/stopped and the NT server rebooted back to todays correct date/time and then the channel restarted ok and went running.
I guess it is something to do with the date/time mismatch between the last message sent and now suddenly going forward in time and the sdr and rcvr pair are getting out of sync somehow. I don't know exactly what the mechanism is yet but am continuing to look to understand exactly what is going on but now that they know it is to do with the date settings they can prevent this from happening in future.
Steve. _________________ Steve
IBM Certified System Administrator Websphere MQ V5.3 |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|