ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Got a strange one on a cluster channel

Post new topic  Reply to topic
 Got a strange one on a cluster channel « View previous topic :: View next topic » 
Author Message
kevinf2349
PostPosted: Mon Dec 11, 2006 2:03 pm    Post subject: Got a strange one on a cluster channel Reply with quote

Grand Master

Joined: 28 Feb 2003
Posts: 1311
Location: USA

We had a strange error today on one of our cluster channels. The enviroment is :

Windows XP
MQ 5.3 CSD11

z/OS
MQ 5.3.1 PTF UK10847 LVL 002.00A

A cluster sender channel from Windows encountred an error:

Code:
12/11/2006  10:22:04
AMQ9213: A communications error for TCP/IP occurred.

EXPLANATION:
An unexpected error occurred in communications.
ACTION:
The return code from the TCP/IP(recv) [TIMEOUT] 360 seconds call was 0 (X'0').
Record these values and tell the systems administrator.


The Windows end went into retrying and the z/OS end still showed the channel as running

All attempts to restart the channel failed and we ended up recycling the CHIN on z/OS (this had to be MVS cancelled too). Once that was done all was well.
I have searched the IBM website and find nothing out there that seems to match. All other channels to and from the Windows box were fine. I am almost sure that this is a network 'fart' but without any corroberating evidence I can't prove it.

Anyone come across this before? How did you get out of it without recycling the CHIN address space?
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Mon Dec 11, 2006 2:13 pm    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

You don't have any FDCs (or mainframe equivalent) on either side?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
kevinf2349
PostPosted: Mon Dec 11, 2006 2:39 pm    Post subject: Reply with quote

Grand Master

Joined: 28 Feb 2003
Posts: 1311
Location: USA

Nothing on the mainframe end at all.

Don't have access on the server side but I will ask someone to check for me.



Update No FDC's generated on the server either. I checked myself cos I never trust those Windows guys!


Last edited by kevinf2349 on Mon Dec 11, 2006 4:14 pm; edited 1 time in total
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Mon Dec 11, 2006 3:48 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Those network hic ups were more frequent when we were running 2.1 on the MF and 5.3 CSD06 or lower in Unix.

Exactly same symptom. The MF thinks the channel is up but the distributed side is in retry... and in our case there was no clustering involved...

We have had very few of those since we moved the MF to a different location and set the MF version to 5.3.1....

Enjoy
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Nigelg
PostPosted: Tue Dec 12, 2006 1:40 am    Post subject: Reply with quote

Grand Master

Joined: 02 Aug 2004
Posts: 1046

Not so very strange...

The initial AMQ9213 is just reporting that the SDR MCA was expecting a response from its partner within 360 seconds, and no response was received.

The Windows channel will then go into RETRYING, as you say. The zOS RCVR will continue to wait for data to arrive - it is reliant on TCP to tell it that the partner has gone away, and if TCP does not do so the status will remain RUNNING.

Tihs is purely a network problem, nothing to do with the installed WMQ versions on either end.
_________________
MQSeries.net helps those who help themselves..
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Tue Dec 12, 2006 2:32 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Nigelg wrote:
Tihs is purely a network problem, nothing to do with the installed WMQ versions on either end.


Then why did restarting CHIN fix it?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
Nigelg
PostPosted: Tue Dec 12, 2006 3:52 am    Post subject: Reply with quote

Grand Master

Joined: 02 Aug 2004
Posts: 1046

Because it stopped the RCVR.
_________________
MQSeries.net helps those who help themselves..
Back to top
View user's profile Send private message
kevinf2349
PostPosted: Tue Dec 12, 2006 5:48 am    Post subject: Reply with quote

Grand Master

Joined: 28 Feb 2003
Posts: 1311
Location: USA

But I stopped the RCVR manually and restarted it and still got the same error.

Also why wouldn't the CHIN stop gracefully? It had to be cancelled out. I truly belive it is a network issue but when an MQ component needs to be cancelled to fix it this doesn't sit well with blaming the network.

Is there some kind of preferred procedure for stopping cluster channels that I missed?
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Tue Dec 12, 2006 3:02 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

kevinf2349 wrote:
But I stopped the RCVR manually and restarted it and still got the same error.

Also why wouldn't the CHIN stop gracefully? It had to be cancelled out. I truly belive it is a network issue but when an MQ component needs to be cancelled to fix it this doesn't sit well with blaming the network.

Is there some kind of preferred procedure for stopping cluster channels that I missed?

Did you stop the RCVR channel with mode=force checking it's status and even with "terminate" if needed?
That's what it used to take us to stop the RCVR.
We did not have to touch the CHIN at all. Just make sure the RCVR channel is in status stopped before you restart it...

Of course you'd have to restart the RCVR but the next retry would then connect. This still does not protect you from an additional out of sequence on the channel...


_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
HubertKleinmanns
PostPosted: Tue Dec 19, 2006 2:49 am    Post subject: Reply with quote

Shaman

Joined: 24 Feb 2004
Posts: 732
Location: Germany

Do you use some "AdoptMCA..." attributes?

Also "KeepAlive" may be helpful.

The "AdoptMCA..." attributes allow, to accept a new channel MCA, "KeepAlive" advises the operating system, to check an IP connection (keeps it alive) and closes it, when the connection partner is no longer available.

See also the document "Intercommunication" for more details.
_________________
Regards
Hubert
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » Clustering » Got a strange one on a cluster channel
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.