MQSeries.net :: View topic - Got a strange one on a cluster channel

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Got a strange one on a cluster channel

Got a strange one on a cluster channel

« View previous topic :: View next topic »

Author

Message

kevinf2349

Posted: Mon Dec 11, 2006 2:03 pm Post subject: Got a strange one on a cluster channel

Grand Master

Joined: 28 Feb 2003
Posts: 1311
Location: USA

We had a strange error today on one of our cluster channels. The enviroment is :

Windows XP
MQ 5.3 CSD11

z/OS
MQ 5.3.1 PTF UK10847 LVL 002.00A

A cluster sender channel from Windows encountred an error:

Code:

12/11/2006 10:22:04
AMQ9213: A communications error for TCP/IP occurred.

EXPLANATION:
An unexpected error occurred in communications.
ACTION:
The return code from the TCP/IP(recv) [TIMEOUT] 360 seconds call was 0 (X'0').
Record these values and tell the systems administrator.

The Windows end went into retrying and the z/OS end still showed the channel as running

All attempts to restart the channel failed and we ended up recycling the CHIN on z/OS (this had to be MVS cancelled too). Once that was done all was well.
I have searched the IBM website and find nothing out there that seems to match. All other channels to and from the Windows box were fine. I am almost sure that this is a network 'fart' but without any corroberating evidence I can't prove it.

Anyone come across this before? How did you get out of it without recycling the CHIN address space?

jefflowrey

Posted: Mon Dec 11, 2006 2:13 pm Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

You don't have any FDCs (or mainframe equivalent) on either side?
_________________
I am *not* the model of the modern major general.

kevinf2349

Posted: Mon Dec 11, 2006 2:39 pm Post subject:

Grand Master

Joined: 28 Feb 2003
Posts: 1311
Location: USA

Nothing on the mainframe end at all.

Don't have access on the server side but I will ask someone to check for me.

Update No FDC's generated on the server either. I checked myself cos I never trust those Windows guys!

Last edited by kevinf2349 on Mon Dec 11, 2006 4:14 pm; edited 1 time in total

fjb_saper

Posted: Mon Dec 11, 2006 3:48 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Those network hic ups were more frequent when we were running 2.1 on the MF and 5.3 CSD06 or lower in Unix.

Exactly same symptom. The MF thinks the channel is up but the distributed side is in retry... and in our case there was no clustering involved...

We have had very few of those since we moved the MF to a different location and set the MF version to 5.3.1....

Enjoy

_________________
MQ & Broker admin

Nigelg

Posted: Tue Dec 12, 2006 1:40 am Post subject:

Grand Master

Joined: 02 Aug 2004
Posts: 1046

Not so very strange...

The initial AMQ9213 is just reporting that the SDR MCA was expecting a response from its partner within 360 seconds, and no response was received.

The Windows channel will then go into RETRYING, as you say. The zOS RCVR will continue to wait for data to arrive - it is reliant on TCP to tell it that the partner has gone away, and if TCP does not do so the status will remain RUNNING.

Tihs is purely a network problem, nothing to do with the installed WMQ versions on either end.
_________________
MQSeries.net helps those who help themselves..

jefflowrey

Posted: Tue Dec 12, 2006 2:32 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Nigelg wrote:

Tihs is purely a network problem, nothing to do with the installed WMQ versions on either end.

Then why did restarting CHIN fix it?
_________________
I am *not* the model of the modern major general.

Nigelg

Posted: Tue Dec 12, 2006 3:52 am Post subject:

Grand Master

Joined: 02 Aug 2004
Posts: 1046

Because it stopped the RCVR.
_________________
MQSeries.net helps those who help themselves..

kevinf2349

Posted: Tue Dec 12, 2006 5:48 am Post subject:

Grand Master

Joined: 28 Feb 2003
Posts: 1311
Location: USA

But I stopped the RCVR manually and restarted it and still got the same error.

Also why wouldn't the CHIN stop gracefully? It had to be cancelled out. I truly belive it is a network issue but when an MQ component needs to be cancelled to fix it this doesn't sit well with blaming the network.

Is there some kind of preferred procedure for stopping cluster channels that I missed?

fjb_saper

Posted: Tue Dec 12, 2006 3:02 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

kevinf2349 wrote:

Did you stop the RCVR channel with mode=force checking it's status and even with "terminate" if needed?
That's what it used to take us to stop the RCVR.
We did not have to touch the CHIN at all. Just make sure the RCVR channel is in status stopped before you restart it...

Of course you'd have to restart the RCVR but the next retry would then connect. This still does not protect you from an additional out of sequence on the channel...

_________________
MQ & Broker admin

HubertKleinmanns

Posted: Tue Dec 19, 2006 2:49 am Post subject:

Shaman

Joined: 24 Feb 2004
Posts: 732
Location: Germany

Do you use some "AdoptMCA..." attributes?

Also "KeepAlive" may be helpful.

The "AdoptMCA..." attributes allow, to accept a new channel MCA, "KeepAlive" advises the operating system, to check an IP connection (keeps it alive) and closes it, when the connection partner is no longer available.

See also the document "Intercommunication" for more details.
_________________
Regards
Hubert

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Clustering » Got a strange one on a cluster channel

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP