Author |
Message
|
zpat |
Posted: Mon Mar 04, 2019 6:13 am Post subject: Where is TCP timeout value set on z/OS |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
On a sender channel between two z/OS QMs (QMR1 at v7.1, QMR2 at v8.0) - we see a timeout from TCP.
My question is where is the value for this timeout configured?
Quote: |
08.08.27 S0026645 +CSQX259E !QMR1 CSQXRCTL Connection timed out, 016
016 channel QMR1.QMR2
016 connection (nn.nn.nn.nn)
016 (queue manager QMR2)
016 TRPTYPE=TCP
08.08.27 S0026645 +CSQX599E !QMR1 CSQXRCTL Channel QMR1.QMR2 ended abnormally |
Also any suggestions as to possible cause would be helpful. The two QMs are located in different organisations. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
Vitor |
Posted: Mon Mar 04, 2019 6:37 am Post subject: Re: Where is TCP timeout value set on z/OS |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
zpat wrote: |
My question is where is the value for this timeout configured? |
Ask your system programmers. There are a number of ways TCP can be set up in z/OS, from a standard stack to VIPR, and it's going to be both site specific & inaccessible to normal mortals in most RACF configurations. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
hughson |
Posted: Mon Mar 04, 2019 3:24 pm Post subject: Re: Where is TCP timeout value set on z/OS |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
zpat wrote: |
My question is where is the value for this timeout configured? |
As with all IBM MQ error messages (or in fact error messages from any software product) it is always worth having a read about the error in the manual.
Knowledge Center wrote: |
CSQX259E: csect-name Connection timed out, channel channel-name connection conn-id (queue manager qmgr-name) TRPTYPE=trptype
Explanation
The connection conn-id timed out. The associated channel is channel-name and the associated remote queue manager is qmgr-name; in some cases the names cannot be determined and so are shown as '????'. trptype shows the communications system used:
TCP TCP/IP
LU62 APPC/MVS
Probable causes are:- A communications failure.
- For a message channel, if the Receive Timeout function is being used (as set by the RCVTIME, RCVTTYPE, and RCVTMIN queue manager attributes) and no response was received from the partner within this time.
- For an MQI channel, if the Client Idle function is being used (as set by the DISCINT server-connection channel attribute) and the client application did not issue an MQI call within this time.
Severity
8
System action
The channel stops.
System programmer response
For a message channel, check the remote end to see why the time out occurred. Note that, if retry values are set, the remote end will restart automatically. If necessary, set the receive wait time for the queue manager to be higher.
For an MQI channel, check that the client application behaviour is correct. If so, set the disconnect interval for the channel to be higher. |
So you are timing out on a TCP receive call, but on a sender channel (as evidenced by the error being reported on queue manager QMR1, the sending end of channel QMR1.QMR2, which is connecting to queue manager QMR2).
You may be thinking that a sender channel shouldn't be in a TCP receive call, but do remember that after the sender channel has sent a batch of messages, it then waits for the receiver channel to acknowledge that batch of messages before starting the next batch. It will be this receive call that has timed out (or possibly waiting for the acknowledgment of a heartbeat flow if your channel is currently idle).
This suggests some slow down on the queue manager QMR2 end of the channel. Any problems there?
Please also issue the following commands to find out the various of the various settings involved.
Code: |
DISPLAY QMGR RCVTIME RCVTTYPE RCVTMIN
DISPLAY CHANNEL(QMR1.QMR2) HBINT |
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
zpat |
Posted: Tue Mar 05, 2019 12:25 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
RCVTIME 2
RCVTTYPE MULTIPLY
RCVTMIN 300
HBINT 60
The timeout seems like 5 mins. Can MQ change that?
It's a shared channel on QSG through a VIPA.
I think that the wait time would be 2 x 60, ie 120 secs but RCVTMIN of 300 secs
is applied. So we could reduce that but it's QM wide. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.
Last edited by zpat on Tue Mar 05, 2019 12:35 am; edited 1 time in total |
|
Back to top |
|
 |
hughson |
Posted: Tue Mar 05, 2019 12:33 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Yes that looks like your receive time out should be 300 seconds.
hughson wrote: |
This suggests some slow down on the queue manager QMR2 end of the channel. Any problems there? |
You didn't answer this?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
zpat |
Posted: Tue Mar 05, 2019 12:36 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
They claim not...... _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
hughson |
Posted: Tue Mar 05, 2019 12:39 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
zpat wrote: |
I think that the wait time would be 2 x 60, ie 120 secs but RCVTMIN of 300 secs is applied. So we could reduce that but it's QM wide. |
I wouldn't reduce anything until you figure out what is causing your timeout. Clearly it is not long enough for whatever is your current timeout.
I would be inclined to turn on MONCHL(HIGH) and take a look at the numbers in NETTIME for your round trips.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
zpat |
Posted: Tue Mar 05, 2019 1:03 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Nettime is usually very stable at around 30,000 microseconds.
When these very occasional like once every 2 weeks timeouts occur it will cause nettime to rise just before the timeout. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
hughson |
Posted: Tue Mar 05, 2019 6:53 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
So that suggests a network issue, rather than a problem at queue manager QMR2. What does your network team say about it? _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Mar 05, 2019 7:22 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Wait! Wait. I know the answer to this one: "... there are no network problems, and nothing has changed." _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
hughson |
Posted: Tue Mar 05, 2019 7:27 am Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
|
Back to top |
|
 |
zpat |
Posted: Tue Mar 05, 2019 11:18 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
That's the one.
Apparently no network or firewall issues either side.
The channel is fairly high volume (around 300 messages per sec) and works fine for weeks on end then gets a timeout, causing a service outage.
That's why I am looking into reducing the time it takes to timeout so that the outage is shorter (I know that won't fix the network).
There is only one sender channel, I don't suppose dividing the traffic up over two channels would make it any faster or more reliable, although it might result in a 50% outage?
Of course there is the other viewpoint, the more channels, the more chance one will get a network glitch (a bit like having more engines on a plane). _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
zpat |
Posted: Tue Mar 05, 2019 11:18 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
That's the one.
Apparently no network or firewall issues either side.
The channel is fairly high volume (around 300 messages per sec) and works fine for weeks on end then gets a timeout, causing a service outage.
That's why I am looking into reducing the time it takes to timeout so that the outage is shorter (I know that won't fix the network).
There is only one sender channel, I don't suppose dividing the traffic up over two channels would make it any faster or more reliable, although it might result in a 50% outage?
Of course there is the other viewpoint, the more channels, the more chance one will get a network glitch (a bit like having more engines on a plane). _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Mar 05, 2019 12:17 pm Post subject: Re: Where is TCP timeout value set on z/OS |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
zpat wrote: |
On a sender channel between two z/OS QMs (QMR1 at v7.1, QMR2 at v8.0) - we see a timeout from TCP. |
Backing up a bit... please describe the network configuration.
Are these two qmgrs on the same z/OS instance?
Are these z/OS instances in the same physical z box?
Over what type of channel are they communicating? HiperSockets? CTC? Copper cat 5 or 6? What type of adapters?
What z/OS releases at both ends of the channel? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Last edited by bruce2359 on Tue Mar 05, 2019 12:44 pm; edited 1 time in total |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Mar 05, 2019 12:43 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Moved to mainframe forum. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
|