Author |
Message
|
kevinf2349 |
Posted: Mon Dec 11, 2006 2:03 pm Post subject: Got a strange one on a cluster channel |
|
|
 Grand Master
Joined: 28 Feb 2003 Posts: 1311 Location: USA
|
We had a strange error today on one of our cluster channels. The enviroment is :
Windows XP
MQ 5.3 CSD11
z/OS
MQ 5.3.1 PTF UK10847 LVL 002.00A
A cluster sender channel from Windows encountred an error:
Code: |
12/11/2006 10:22:04
AMQ9213: A communications error for TCP/IP occurred.
EXPLANATION:
An unexpected error occurred in communications.
ACTION:
The return code from the TCP/IP(recv) [TIMEOUT] 360 seconds call was 0 (X'0').
Record these values and tell the systems administrator. |
The Windows end went into retrying and the z/OS end still showed the channel as running
All attempts to restart the channel failed and we ended up recycling the CHIN on z/OS (this had to be MVS cancelled too). Once that was done all was well.
I have searched the IBM website and find nothing out there that seems to match. All other channels to and from the Windows box were fine. I am almost sure that this is a network 'fart' but without any corroberating evidence I can't prove it.
Anyone come across this before? How did you get out of it without recycling the CHIN address space?  |
|
Back to top |
|
 |
jefflowrey |
Posted: Mon Dec 11, 2006 2:13 pm Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
You don't have any FDCs (or mainframe equivalent) on either side? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
kevinf2349 |
Posted: Mon Dec 11, 2006 2:39 pm Post subject: |
|
|
 Grand Master
Joined: 28 Feb 2003 Posts: 1311 Location: USA
|
Nothing on the mainframe end at all.
Don't have access on the server side but I will ask someone to check for me.
Update No FDC's generated on the server either. I checked myself cos I never trust those Windows guys! 
Last edited by kevinf2349 on Mon Dec 11, 2006 4:14 pm; edited 1 time in total |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Dec 11, 2006 3:48 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Those network hic ups were more frequent when we were running 2.1 on the MF and 5.3 CSD06 or lower in Unix.
Exactly same symptom. The MF thinks the channel is up but the distributed side is in retry... and in our case there was no clustering involved...
We have had very few of those since we moved the MF to a different location and set the MF version to 5.3.1....
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Nigelg |
Posted: Tue Dec 12, 2006 1:40 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
Not so very strange...
The initial AMQ9213 is just reporting that the SDR MCA was expecting a response from its partner within 360 seconds, and no response was received.
The Windows channel will then go into RETRYING, as you say. The zOS RCVR will continue to wait for data to arrive - it is reliant on TCP to tell it that the partner has gone away, and if TCP does not do so the status will remain RUNNING.
Tihs is purely a network problem, nothing to do with the installed WMQ versions on either end. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Dec 12, 2006 2:32 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Nigelg wrote: |
Tihs is purely a network problem, nothing to do with the installed WMQ versions on either end. |
Then why did restarting CHIN fix it? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Nigelg |
Posted: Tue Dec 12, 2006 3:52 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
Because it stopped the RCVR. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
kevinf2349 |
Posted: Tue Dec 12, 2006 5:48 am Post subject: |
|
|
 Grand Master
Joined: 28 Feb 2003 Posts: 1311 Location: USA
|
But I stopped the RCVR manually and restarted it and still got the same error.
Also why wouldn't the CHIN stop gracefully? It had to be cancelled out. I truly belive it is a network issue but when an MQ component needs to be cancelled to fix it this doesn't sit well with blaming the network.
Is there some kind of preferred procedure for stopping cluster channels that I missed? |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Dec 12, 2006 3:02 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
kevinf2349 wrote: |
But I stopped the RCVR manually and restarted it and still got the same error.
Also why wouldn't the CHIN stop gracefully? It had to be cancelled out. I truly belive it is a network issue but when an MQ component needs to be cancelled to fix it this doesn't sit well with blaming the network.
Is there some kind of preferred procedure for stopping cluster channels that I missed? |
Did you stop the RCVR channel with mode=force checking it's status and even with "terminate" if needed?
That's what it used to take us to stop the RCVR.
We did not have to touch the CHIN at all. Just make sure the RCVR channel is in status stopped before you restart it...
Of course you'd have to restart the RCVR but the next retry would then connect. This still does not protect you from an additional out of sequence on the channel...
 _________________ MQ & Broker admin |
|
Back to top |
|
 |
HubertKleinmanns |
Posted: Tue Dec 19, 2006 2:49 am Post subject: |
|
|
 Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
Do you use some "AdoptMCA..." attributes?
Also "KeepAlive" may be helpful.
The "AdoptMCA..." attributes allow, to accept a new channel MCA, "KeepAlive" advises the operating system, to check an IP connection (keeps it alive) and closes it, when the connection partner is no longer available.
See also the document "Intercommunication" for more details. _________________ Regards
Hubert |
|
Back to top |
|
 |
|