ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » TIMEOUT 360

Post new topic  Reply to topic
 TIMEOUT 360 « View previous topic :: View next topic » 
Author Message
zrux
PostPosted: Mon Jul 29, 2019 7:32 am    Post subject: TIMEOUT 360 Reply with quote

Apprentice

Joined: 21 May 2006
Posts: 37
Location: UK

This is the scenario

VM1 has QM(QM1A, QM2A)
sending to

VM2 QM (QM1B)


We want the messages from QMs(QM1A, QM2A) on VM1 to QM (QM1B) on VM2 at the same time


We are able to send the message from QM1A -> QM1B (using SDR chl QM1A.QM1B )

We are able to send the message from QM2A -> QM1B (using SDR chl QM2A.QM1B )

but as soon as we try to start both the channels QM1A.QM1B and QM2A.QM1B the messages stops flowing and piles up on their XMITQs and we get the following error. the HBINT is set to 300.

Subsequent restart of the channels doesn't allow the messages to flow, till the channel is "resolved" .

The channel is set to "Trigger" and "Disconnect interval" set to 0, the rest of the values are defaults on the channel.
The system is not under heavy load, all critical parameters e.g CPU, Memory, IO is doing fine.
Any idea what could be going wrong here ..? Network issue?

…………………………………………………….
AMQ9259: Connection timed out from host 'xxxx(yyyy)'.

EXPLANATION:
A connection from host 'xxxx(yyyy)' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself.
Back to top
View user's profile Send private message
Vitor
PostPosted: Mon Jul 29, 2019 7:45 am    Post subject: Re: TIMEOUT 360 Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

zrux wrote:
Any idea what could be going wrong here ..? Network issue?


Unlikely if resolving the channels fixes it

Post the channel definitions (obfuscating any sensitive stuff like the real IP addresses).

Are you absolutely certain you have 2 separate channels, with 1 sender on each of QM1A & QM2A and 2 receiver channels on QM1B? Because it sounds a lot like the sender channels are tripping over each other's sequence numbers somehow.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
hughson
PostPosted: Mon Jul 29, 2019 3:00 pm    Post subject: Re: TIMEOUT 360 Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

zrux wrote:
Subsequent restart of the channels doesn't allow the messages to flow, till the channel is "resolved" .

What prompted you to RESOLVE the channel? Is there another error message that led you to issue that command. Can you show us the actual command you issued also?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
gbaddeley
PostPosted: Mon Jul 29, 2019 3:34 pm    Post subject: Re: TIMEOUT 360 Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2494
Location: Melbourne, Australia

zrux wrote:
...but as soon as we try to start both the channels QM1A.QM1B and QM2A.QM1B the messages stops flowing and piles up on their XMITQs and we get the following error. the HBINT is set to 300...

AMQ9259: Connection timed out from host 'xxxx(yyyy)'.
EXPLANATION:
A connection from host 'xxxx(yyyy)' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was not received in the expected time. Correct the problem. Reconnect the channel,or wait for a retrying channel to reconnect itself.

That's quite unusual. select() is a low level OS function that will be waiting on TCP data to be received. If it times out, there may be a network issue or an issue in the TCP stack or OS on either side. Are there any other errors in the log, or in /var/mqm/errors ?
_________________
Glenn
Back to top
View user's profile Send private message
rujova
PostPosted: Tue Nov 02, 2021 2:30 pm    Post subject: Reply with quote

Novice

Joined: 07 Jan 2015
Posts: 13

Hello there!

I know this is an oldly thread, but I have read other threads that talk about the same topic and I did not find a workable solution.

We have the same scenario between a Windows server, an AIX, and an IBM i.

In the AIX and IBM i logs (BACKEND) what is recorded is a timeout, and in the windows log (FRONTEND), which is the source of the communication, what is recorded is that it is closed.

Code:

11/02/21 15:45:49 - Process(#) User(X) Program(amqrmppa)
                    Host(BACKEND) Installation(Installation1)
                    VRMF(9.1.0.6) QMgr(BACKEND)
                    Time(2021-11-02T21:45:49.000Z)
                    RemoteHost(X.X.X.X)
                    CommentInsert1(X.X.X.X)
                    CommentInsert2(TCP/IP)
                    CommentInsert3(select() [TIMEOUT] 360 seconds)

AMQ9259E: Connection timed out from host 'X.X.X.X'.

EXPLANATION:
A connection from host 'X.X.X.X' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself.


Code:

11/2/2021 15:42:24 - Process(#) User(X) Program(runmqchl.exe)
                      Host(FRONTEND) Installation(Installation1)
                      VRMF(9.1.0.6) QMgr(FRONTEND)
                      Time(2021-11-02T21:42:24.305Z)
                      RemoteHost(X.X.X.X)
                      CommentInsert1(X.X.X.X))
                      CommentInsert2(TCP/IP)
                      CommentInsert3(TO.BACKEND)
                     
AMQ9209E: Connection to host 'X.X.X.X' for channel
'TO.BACKEND' closed.

EXPLANATION:
An error occurred receiving data from 'X.X.X.X' over TCP/IP.  The
connection to the remote host has unexpectedly terminated.

The channel name is 'TO.BACKEND'; in some cases it cannot be determined
and so is shown as '????'.
ACTION:
Tell the systems administrator.


I was monitoring the cluster sender channel SHORTRMT and LONGRMT values, since they do not refresh once the channel manages to establish communication. The KC documentation indicates that these counter statistics are reset until a message is successfully delivered. I was hoping that HBINT and KALIVE could act on them, but that's not how it works.

Planning to increase the SHORTRTY, which is currently at 10 every 60 seconds. HBINT is set to 300 seconds and KALIVE is set to Auto, which gives us 360 seconds, but I don't feel comfortable with this change, as it doesn't fix the root cause of disconnections. The networking team assures me that at the firewall level there are no settings that cut communication if the channels are IDLE. Our DISCINT is set to 0, to bypass some audit rules to eliminate idle network accesses.

Once the channel begins to receive load, the scenario is solved, but the connection with the alternate geographic site begins to fail.
_________________
Looking Forward,

Rujova
Back to top
View user's profile Send private message
hughson
PostPosted: Tue Nov 02, 2021 4:49 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

Retry Timer lengths (SHORTTMR and LONGTMR) and Retry Counts (SHORTRTY and LONGRTY) settings only control what happens in between connectivity. They will not change the timeout you are seeing, only the pattern of the attempts to reconnect the lost connection.

I see from your error messages (which are in reverse chronological order), that QMgr(FRONTEND) has a TCP/IP network failure at 15:42:24, and then at 15:45:49, 3 minutes and 25 seconds later QMgr(BACKEND) times out waiting for data. So QMgr(BACKEND) is operating exactly as expected, we can see the socket is closed, and after waiting for a time, it wakes up (because the TCP/IP stack didn't wake it up) and realises it hasn't got any data.

The problem should be investigated on the QMgr(FRONTEND) machine. What caused the TCP/IP failure to recv() data? If this happens regularly, your network people should be able to detect the cause.

Whether it happens regularly or not, your channels' retry settings will get the socket back up and running again.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » TIMEOUT 360
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.