|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
TIMEOUT 360 |
« View previous topic :: View next topic » |
Author |
Message
|
zrux |
Posted: Mon Jul 29, 2019 7:32 am Post subject: TIMEOUT 360 |
|
|
Apprentice
Joined: 21 May 2006 Posts: 41 Location: UK
|
This is the scenario
VM1 has QM(QM1A, QM2A)
sending to
VM2 QM (QM1B)
We want the messages from QMs(QM1A, QM2A) on VM1 to QM (QM1B) on VM2 at the same time
We are able to send the message from QM1A -> QM1B (using SDR chl QM1A.QM1B )
We are able to send the message from QM2A -> QM1B (using SDR chl QM2A.QM1B )
but as soon as we try to start both the channels QM1A.QM1B and QM2A.QM1B the messages stops flowing and piles up on their XMITQs and we get the following error. the HBINT is set to 300.
Subsequent restart of the channels doesn't allow the messages to flow, till the channel is "resolved" .
The channel is set to "Trigger" and "Disconnect interval" set to 0, the rest of the values are defaults on the channel.
The system is not under heavy load, all critical parameters e.g CPU, Memory, IO is doing fine.
Any idea what could be going wrong here ..? Network issue?
…………………………………………………….
AMQ9259: Connection timed out from host 'xxxx(yyyy)'.
EXPLANATION:
A connection from host 'xxxx(yyyy)' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself. |
|
Back to top |
|
 |
Vitor |
Posted: Mon Jul 29, 2019 7:45 am Post subject: Re: TIMEOUT 360 |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
zrux wrote: |
Any idea what could be going wrong here ..? Network issue? |
Unlikely if resolving the channels fixes it
Post the channel definitions (obfuscating any sensitive stuff like the real IP addresses).
Are you absolutely certain you have 2 separate channels, with 1 sender on each of QM1A & QM2A and 2 receiver channels on QM1B? Because it sounds a lot like the sender channels are tripping over each other's sequence numbers somehow. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
hughson |
Posted: Mon Jul 29, 2019 3:00 pm Post subject: Re: TIMEOUT 360 |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
zrux wrote: |
Subsequent restart of the channels doesn't allow the messages to flow, till the channel is "resolved" . |
What prompted you to RESOLVE the channel? Is there another error message that led you to issue that command. Can you show us the actual command you issued also?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
gbaddeley |
Posted: Mon Jul 29, 2019 3:34 pm Post subject: Re: TIMEOUT 360 |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
zrux wrote: |
...but as soon as we try to start both the channels QM1A.QM1B and QM2A.QM1B the messages stops flowing and piles up on their XMITQs and we get the following error. the HBINT is set to 300...
AMQ9259: Connection timed out from host 'xxxx(yyyy)'.
EXPLANATION:
A connection from host 'xxxx(yyyy)' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was not received in the expected time. Correct the problem. Reconnect the channel,or wait for a retrying channel to reconnect itself. |
That's quite unusual. select() is a low level OS function that will be waiting on TCP data to be received. If it times out, there may be a network issue or an issue in the TCP stack or OS on either side. Are there any other errors in the log, or in /var/mqm/errors ? _________________ Glenn |
|
Back to top |
|
 |
rujova |
Posted: Tue Nov 02, 2021 2:30 pm Post subject: |
|
|
 Novice
Joined: 07 Jan 2015 Posts: 13
|
Hello there!
I know this is an oldly thread, but I have read other threads that talk about the same topic and I did not find a workable solution.
We have the same scenario between a Windows server, an AIX, and an IBM i.
In the AIX and IBM i logs (BACKEND) what is recorded is a timeout, and in the windows log (FRONTEND), which is the source of the communication, what is recorded is that it is closed.
Code: |
11/02/21 15:45:49 - Process(#) User(X) Program(amqrmppa)
Host(BACKEND) Installation(Installation1)
VRMF(9.1.0.6) QMgr(BACKEND)
Time(2021-11-02T21:45:49.000Z)
RemoteHost(X.X.X.X)
CommentInsert1(X.X.X.X)
CommentInsert2(TCP/IP)
CommentInsert3(select() [TIMEOUT] 360 seconds)
AMQ9259E: Connection timed out from host 'X.X.X.X'.
EXPLANATION:
A connection from host 'X.X.X.X' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself.
|
Code: |
11/2/2021 15:42:24 - Process(#) User(X) Program(runmqchl.exe)
Host(FRONTEND) Installation(Installation1)
VRMF(9.1.0.6) QMgr(FRONTEND)
Time(2021-11-02T21:42:24.305Z)
RemoteHost(X.X.X.X)
CommentInsert1(X.X.X.X))
CommentInsert2(TCP/IP)
CommentInsert3(TO.BACKEND)
AMQ9209E: Connection to host 'X.X.X.X' for channel
'TO.BACKEND' closed.
EXPLANATION:
An error occurred receiving data from 'X.X.X.X' over TCP/IP. The
connection to the remote host has unexpectedly terminated.
The channel name is 'TO.BACKEND'; in some cases it cannot be determined
and so is shown as '????'.
ACTION:
Tell the systems administrator.
|
I was monitoring the cluster sender channel SHORTRMT and LONGRMT values, since they do not refresh once the channel manages to establish communication. The KC documentation indicates that these counter statistics are reset until a message is successfully delivered. I was hoping that HBINT and KALIVE could act on them, but that's not how it works.
Planning to increase the SHORTRTY, which is currently at 10 every 60 seconds. HBINT is set to 300 seconds and KALIVE is set to Auto, which gives us 360 seconds, but I don't feel comfortable with this change, as it doesn't fix the root cause of disconnections. The networking team assures me that at the firewall level there are no settings that cut communication if the channels are IDLE. Our DISCINT is set to 0, to bypass some audit rules to eliminate idle network accesses.
Once the channel begins to receive load, the scenario is solved, but the connection with the alternate geographic site begins to fail.  _________________ Looking Forward,
Rujova |
|
Back to top |
|
 |
hughson |
Posted: Tue Nov 02, 2021 4:49 pm Post subject: |
|
|
 Padawan
Joined: 09 May 2013 Posts: 1959 Location: Bay of Plenty, New Zealand
|
Retry Timer lengths (SHORTTMR and LONGTMR) and Retry Counts (SHORTRTY and LONGRTY) settings only control what happens in between connectivity. They will not change the timeout you are seeing, only the pattern of the attempts to reconnect the lost connection.
I see from your error messages (which are in reverse chronological order), that QMgr(FRONTEND) has a TCP/IP network failure at 15:42:24, and then at 15:45:49, 3 minutes and 25 seconds later QMgr(BACKEND) times out waiting for data. So QMgr(BACKEND) is operating exactly as expected, we can see the socket is closed, and after waiting for a time, it wakes up (because the TCP/IP stack didn't wake it up) and realises it hasn't got any data.
The problem should be investigated on the QMgr(FRONTEND) machine. What caused the TCP/IP failure to recv() data? If this happens regularly, your network people should be able to detect the cause.
Whether it happens regularly or not, your channels' retry settings will get the socket back up and running again.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|