ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » sdr channel retrying - went running after stop/start

Post new topic  Reply to topic
 sdr channel retrying - went running after stop/start « View previous topic :: View next topic » 
Author Message
scmq2758
PostPosted: Fri Sep 24, 2004 12:04 am    Post subject: sdr channel retrying - went running after stop/start Reply with quote

Newbie

Joined: 23 Sep 2004
Posts: 3
Location: Manchester, UK

We have had a sender channel from an MQ 5.3 queue manager on NT back to a 5.2 queue manager on AIX go into retrying mode. We think this might have been due to a network glitch as it was very slow around the time this happened. Anyway, this started retrying around 3 weeks ago and has only just been noticed. We could ping the AIX box from the NT server ok and also do a TRACERT ok. The channel was using dotted IP address so it was not a DNS issue.

Against the advice I have read in the manuals, I stopped and then started the channel (as it was such a long time since the problem occurred and I couldn't believe that the underlying problem still existed) and the channel immediately went RUNNING. I am confused as this goes against what I believed about MQ ie that a channel will always reconnect after a failure through the use of the short and long retries and that you should never have to manually take action as I had to do.

Anyone seen this or can come up with an explanation of how this type of incident can occur where the channel doesn't reconnect itself?

Thanks,
_________________
Steve
IBM Certified System Administrator Websphere MQ V5.3
Back to top
View user's profile Send private message
JasonE
PostPosted: Fri Sep 24, 2004 12:30 am    Post subject: Reply with quote

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

Not really... What was the error messages describing why the retry was failing? (look in the qmgr amqerr* log files on both sides).
Back to top
View user's profile Send private message
scmq2758
PostPosted: Fri Sep 24, 2004 1:31 am    Post subject: Reply with quote

Newbie

Joined: 23 Sep 2004
Posts: 3
Location: Manchester, UK

The receiving end on AIX has no mention of any errors for the receiver channel in the logs. The associated sender channel back to NT is working ok and continuing to send messages through to the NT box but nothing was coming back.
On the NT server there was a mention of a TCP/IP error code 10054 (which is a time-out or connection reset I believe) back about three weeks ago when the problem seems to have occurred for the sender channel but then there is nothing to show the retry attempts. Normally I would expect to see this retrying every 20 minutes and producing an error message in the log but I don't see anything after the 10054 error some weeks ago. I didn't look at the channel status to see the number of long retries left and to confirm that this was going down but the lack of messages might suggest that the retries were not actually happening even though the status was RETRYING?
_________________
Steve
IBM Certified System Administrator Websphere MQ V5.3
Back to top
View user's profile Send private message
JasonE
PostPosted: Fri Sep 24, 2004 3:17 am    Post subject: Reply with quote

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

Well, if you run out of short and long retries, it should go into stopped, so that probably didnt happen. As you say, you coul have been in a long retry but I would have expected that to be set to 3 weeks!

10054 means the connection was dropped from 'the other end' (or at least, the windows box just got told the socket was dropped, we cant see why).

I suspect its one of those you'll have to watch out for and if it happens again look at the full channel status output, and see what retry should be going on.
Back to top
View user's profile Send private message
scmq2758
PostPosted: Mon Sep 27, 2004 3:25 am    Post subject: Reply with quote

Newbie

Joined: 23 Sep 2004
Posts: 3
Location: Manchester, UK

This sdr channel retrying problem has re-occurred today.
What has happened is that the Applications people were testing a change and set the Windows system date forward to 01/01/2005. Then, without stopping and restarting the queue manager they sent a message. The error log just shows the remote end disconnecting with no error code and the sdr channel going into retry but the critical thing is that is 'in doubt' so the short retries never decreases from 10. The channel was resolved/stopped and the NT server rebooted back to todays correct date/time and then the channel restarted ok and went running.

I guess it is something to do with the date/time mismatch between the last message sent and now suddenly going forward in time and the sdr and rcvr pair are getting out of sync somehow. I don't know exactly what the mechanism is yet but am continuing to look to understand exactly what is going on but now that they know it is to do with the date settings they can prevent this from happening in future.

Steve.
_________________
Steve
IBM Certified System Administrator Websphere MQ V5.3
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » sdr channel retrying - went running after stop/start
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.