MQSeries.net :: View topic - Where is TCP timeout value set on z/OS

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Mainframe, CICS, TXSeries » Where is TCP timeout value set on z/OS

Goto page Previous 1, 2, 3 Next

Where is TCP timeout value set on z/OS

« View previous topic :: View next topic »

Author

Message

hughson

Posted: Tue Mar 05, 2019 7:47 pm Post subject: Re: Where is TCP timeout value set on z/OS

Padawan

Joined: 09 May 2013
Posts: 1964
Location: Bay of Plenty, New Zealand

bruce2359 wrote:

zpat wrote:

On a sender channel between two z/OS QMs (QMR1 at v7.1, QMR2 at v8.0) - we see a timeout from TCP.

Backing up a bit... please describe the network configuration.

Are these two qmgrs on the same z/OS instance?

Are these z/OS instances in the same physical z box?

Over what type of channel are they communicating? HiperSockets? CTC? Copper cat 5 or 6? What type of adapters?

What z/OS releases at both ends of the channel?

He did say this:-

zpat wrote:

The two QMs are located in different organisations.

_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

bruce2359

Posted: Tue Mar 05, 2019 8:34 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Ooops. Missed that. So much for my investment in speed-reading course.

What are the CHINUT settings at both ends?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

zpat

Posted: Wed Mar 06, 2019 12:08 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

As mentioned. They are in different organisations.

Ours is Zos 2.2. The issue is likely with the external network but we cant prove it.

CHINUT ?
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

fjb_saper

Posted: Wed Mar 06, 2019 3:46 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20764
Location: LI,NY

zpat wrote:

As mentioned. They are in different organisations.

Ours is Zos 2.2. The issue is likely with the external network but we cant prove it.

CHINUT ?

I figure he meant Channel INIT or CHINIT...

_________________
MQ & Broker admin

bruce2359

Posted: Wed Mar 06, 2019 5:28 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Yes, CHINIT channel initiator address space.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

bruce2359

Posted: Wed Mar 06, 2019 12:57 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

What are adapters and dispatchers values?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

hughson

Posted: Wed Mar 06, 2019 1:23 pm Post subject:

Padawan

Joined: 09 May 2013
Posts: 1964
Location: Bay of Plenty, New Zealand

bruce2359 wrote:

What are adapters and dispatchers values?

Are you thinking that the network slow-down is based on having too few dispatchers?

We've already ruled out commit slow down since NETTIME is seen to increase just before timeout is seen, so I don't think the number of adapters is at fault.
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

bruce2359

Posted: Wed Mar 06, 2019 5:14 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

hughson wrote:

bruce2359 wrote:

What are adapters and dispatchers values?

It's a possibility. I've seen test system small values accidentally percolate into production. Dispatchers face the network. Adapters face inward to support MQI calls.

Generally, 300 msgs/sec is not a very heavy load for z/OS MQ. I'm always suspicious of firewalls.

What else is going on in the entire network at the time of the failure? Is someone FTPing huge files? Streaming video?

EREP reporting anything with NIC cards? RMF reporting anything?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

zpat

Posted: Thu Mar 07, 2019 6:18 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

60 dispatchers started.

Some big FTPs on the same adapter but not over the same external network link. No obvious corelation on the time of restart.

No apparent hardware errors and timeouts have happened on this channel which has CHLDISP of SHARED on both sides of the QSG which are on different sites and hardware.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

bruce2359

Posted: Thu Mar 07, 2019 7:50 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Iâ€™m guessing that your replies re EREP and RMF are about your end of the channel. Do you have access to SYSLOG or a helpful sysprog at the other end?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

zpat

Posted: Tue Mar 12, 2019 6:01 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

Other end can't see any issues.

We've now been seeing relatively high network latency on this link recently without actual timeouts.

Can't seem to find the cause of this latency as seen in the sender channel NETTIME value.

Network guys can't see any issues. But the nettime values are almost 10 times higher than usual.

Could anything in z/OS TCP stack cause delays? - seems unlikely to me.

Restarting the channel resumed normal latency.

_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

hughson

Posted: Tue Mar 12, 2019 4:25 pm Post subject:

Padawan

Joined: 09 May 2013
Posts: 1964
Location: Bay of Plenty, New Zealand

zpat wrote:

Restarting the channel resumed normal latency.

Closing the old socket and making a new one causes the the network to return to normal latency suggests that the socket had perhaps gone into re-transmission mode. Perhaps a router in the network has been having issues.

Just a guess though.
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

zpat

Posted: Wed Mar 13, 2019 12:01 pm Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

That's what I have been trying to convince the network team of.

So just to be totally clear. What things can cause NETTIME to increase?

z/OS TCP software layer (causes?)
Internal network (VIPA sysplex adapter)
Our firewall
Virtual Circuit from telecom company
Firewall at 3rd Party
Network inside 3rd Party
z/OS TCP at 3rd Party

Are all these possible?

None of these are MQ itself - can we rule out z/OS MQ on the two QMs as a cause of latency as measured by the NETTIME? Is there any point taking a MQ trace?

Sorry to be pedantic - but what exactly is NETTIME measuring?

When working normally it is around 30 millisecs, when slow it is consistently up at 250 millisecs which leads to delays as it can't process the messages fast enough.

After stop/start it's been running fine, but this slow down has happened occasionally so will probably re-occur unless we can find the root cause.

Thanks.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

hughson

Posted: Wed Mar 13, 2019 2:19 pm Post subject:

Padawan

Joined: 09 May 2013
Posts: 1964
Location: Bay of Plenty, New Zealand

zpat wrote:

What exactly is NETTIME measuring?

An MQ channel, when it is doing a round-trip (end of batch or a heartbeat) remembers the time it sent the "Request for confirmation" flow, and when it gets back the "Acknowledgement" flow, it takes the time again. Inside the "Acknowledgement" flow is an amount of time that the partner end spent doing the MQCMIT (if it was an end of batch), and this value is removed from the time taken to do the round-trip.

So NETTIME is as close to only measuring the time spent in the network as it can be (from the perspective of MQ the owner of the socket).

It's intent was to give MQ Administrators some ammunition when talking to the network team to point out to them that there was a problem on the network.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

zpat

Posted: Thu Mar 14, 2019 1:22 pm Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

Thanks, looking at the KC, I see this

Quote:

The NETTIME value is the amount of time, displayed in microseconds, taken to send an end of batch request to the remote end of the channel and receive a response minus the time to process the end of batch request. This value can be large for either of the following reasons:

The network is slow.

A slow network can affect the time it takes to complete a batch. The measurements that result in the indicators for the NETTIME field are measured at the end of a batch. However, the first batch affected by a slowdown in the network is not indicated with a change in the NETTIME value because it is measured at the end of the batch.

Requests are queued at the remote end, for example a channel can be retrying a put, or a put request may be slow due to page set I/O. Once any queued requests have completed, the duration of the end of batch request is measured. So if you get a large NETTIME value, check for unusual processing at the remote end.

I am confused by the last paragraph which suggests that MQ processing delays at the remote end are included in NETTIME.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

Display posts from previous:

Goto page Previous 1, 2, 3 Next

Page 2 of 3

MQSeries.net Forum Index » Mainframe, CICS, TXSeries » Where is TCP timeout value set on z/OS

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP