ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Why these channel instances are not dropping

Post new topic  Reply to topic Goto page Previous  1, 2, 3, 4  Next
 Why these channel instances are not dropping « View previous topic :: View next topic » 
Author Message
vicks_mq
PostPosted: Wed May 15, 2019 7:28 am    Post subject: Reply with quote

Disciple

Joined: 03 Oct 2017
Posts: 162

vicks_mq wrote:
I just restarted the connecting application at 9:49 EST and I got my 1st TIMEOUT error at 05/15/19 10:28:08AM and then the 2nd timeout at 05/15/19 10:30:22 and 3rd TIMEOUT at 05/15/19 10:33:09 and 4th one at 05/15/19 10:35:22 and now no TIMOUT for the last 25 minutes.
I will check again.
I got again TIMOUT continue at 11:14:10AM and then last another 2 minutes until 11:16AM and it is again quiet after that.(15 mins passed)
Back to top
View user's profile Send private message
hughson
PostPosted: Wed May 15, 2019 9:25 am    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

vicks_mq wrote:
hughson wrote:

Then it is not DISCINT that is dropping the connections. Sorry - I was under the impression that the connections were dropping "after a few hours" before.

This sounds very much like a firewall is dropping the connection after 40 minutes, although not due to inactivity since you know that heartbeat flows are going across. The timeout message you report is going to be the sender of the heartbeat waiting for the answer back from the heartbeat flow and not getting anything because the socket is no longer there. The 65 second timeout suggests this because other receive-wait (select) calls would use the negotiated heartbeat value (plus a bit) which you have told us is 300.

Cheers,
Morag


I have a question, if Firewall is dropping connection in 65 seconds, then all the connections should drop in 65 seconds of inactivity, why the connection is taking 40 minutes to drop and after that it is pretty random, sometime one connection drop in 40 minutes and next immediately in 2-3 minutes then wait another 35-40 mins and so on.

Sorry I have not explained the 65 second timeout very well. The Channel is sending a heartbeat (after an idle time of the HBINT - for you 300 seconds), and is then waiting 65 seconds for the answer which never comes. It is not 65 seconds of inactivity, it is within the 65 seconds where it is waiting for the answer.

You seem to be able to recreate this at will, so TCP/IP level diagnostics should be able to determine why it is being dropped.

I would agree with the suggestions to talk to your network firewall/router team.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
bruce2359
PostPosted: Wed May 15, 2019 9:46 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9392
Location: US: west coast, almost. Otherwise, enroute.

MQ offers a suite of configuration options to recover MQ channels from transient network errors. Heartbeat interval, disconnect interval, short- and long- retry interval and retry timer. MQ cannot correct bad network configuration.

Search google for "Keeping MQ channels up and running."
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
vicks_mq
PostPosted: Wed May 15, 2019 10:13 am    Post subject: Reply with quote

Disciple

Joined: 03 Oct 2017
Posts: 162

hughson wrote:
vicks_mq wrote:
hughson wrote:

Then it is not DISCINT that is dropping the connections. Sorry - I was under the impression that the connections were dropping "after a few hours" before.

This sounds very much like a firewall is dropping the connection after 40 minutes, although not due to inactivity since you know that heartbeat flows are going across. The timeout message you report is going to be the sender of the heartbeat waiting for the answer back from the heartbeat flow and not getting anything because the socket is no longer there. The 65 second timeout suggests this because other receive-wait (select) calls would use the negotiated heartbeat value (plus a bit) which you have told us is 300.

Cheers,
Morag


I have a question, if Firewall is dropping connection in 65 seconds, then all the connections should drop in 65 seconds of inactivity, why the connection is taking 40 minutes to drop and after that it is pretty random, sometime one connection drop in 40 minutes and next immediately in 2-3 minutes then wait another 35-40 mins and so on.

Sorry I have not explained the 65 second timeout very well. The Channel is sending a heartbeat (after an idle time of the HBINT - for you 300 seconds), and is then waiting 65 seconds for the answer which never comes. It is not 65 seconds of inactivity, it is within the 65 seconds where it is waiting for the answer.

You seem to be able to recreate this at will, so TCP/IP level diagnostics should be able to determine why it is being dropped.

I would agree with the suggestions to talk to your network firewall/router team.

Cheers,
Morag


I have to confess that we don't have in house firewall/router team & the one we have engaging them is harder than moving mountains

I noticed one more behaviour, whenever the channel TIMEOUT error comes, all the channel instances which started together goes down, even for some of the channel instances the LSTMSGTI is current.
I was running MQSC in a loop every 1 minute and got this output.
5724-H72 (C) Copyright IBM Corp. 1994, 2015.
Starting MQSC for queue manager CHAMAN.


1 : DIS Chs(LONDON.TO.PARIS) chstati lstmsgti BYTSSENT BYTSRCVD JOBNAME
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(413708) BYTSSENT(121184)
CHSTATI(12.53.25) CONNAME(10.111.222.333)
CURRENT JOBNAME(001560110000D2B4)
LSTMSGTI(13.38.45) STATUS(RUNNING)
SUBSTATE(RECEIVE)
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(1608) BYTSSENT(1604)
CHSTATI(12.53.31) CONNAME(10.111.222.333)
CURRENT JOBNAME(001638190000D26F)
LSTMSGTI(12.53.31) STATUS(RUNNING)
SUBSTATE(RECEIVE)
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(413860) BYTSSENT(121228)
CHSTATI(12.53.25) CONNAME(10.111.222.333)
CURRENT JOBNAME(001560110000D2B3)
LSTMSGTI(13.38.45) STATUS(RUNNING)
SUBSTATE(RECEIVE)
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(413860) BYTSSENT(121228)
CHSTATI(12.53.25) CONNAME(10.111.222.333)
CURRENT JOBNAME(001638190000D269)
LSTMSGTI(13.38.45) STATUS(RUNNING)
SUBSTATE(RECEIVE)
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(413708) BYTSSENT(121184)
CHSTATI(12.53.25) CONNAME(10.111.222.333)
CURRENT JOBNAME(001638190000D26C)
LSTMSGTI(13.38.45) STATUS(RUNNING)
SUBSTATE(RECEIVE)
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(1608) BYTSSENT(1604)
CHSTATI(12.53.25) CONNAME(10.111.222.333)
CURRENT JOBNAME(001560110000D2B2)
LSTMSGTI(12.53.25) STATUS(RUNNING)
SUBSTATE(RECEIVE)
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(413860) BYTSSENT(121228)
CHSTATI(12.53.25) CONNAME(10.111.222.333)
CURRENT JOBNAME(001560110000D2B6)
LSTMSGTI(13.38.46) STATUS(RUNNING)
SUBSTATE(RECEIVE)
AMQ8417I: Display Channel Status details.
CHANNEL(LONDON.TO.PARIS) CHLTYPE(SVRCONN)
BYTSRCVD(413860) BYTSSENT(121228)
CHSTATI(12.53.25) CONNAME(10.111.222.333)
CURRENT JOBNAME(001638190000D26B)
LSTMSGTI(13.38.46) STATUS(RUNNING)
SUBSTATE(RECEIVE)
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed May 15, 2019 10:31 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

vicks_mq wrote:
the one we have engaging them is harder than moving mountains


In house teams are no easier.

I concur with the expressed opinion; this isn't an MQ issue, this is a network issue. Move the mountain or live with the problem, knowing that most of us have shared your pain at some point, and will share it again in the future.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
hughson
PostPosted: Wed May 15, 2019 11:16 am    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

vicks_mq wrote:
I noticed one more behaviour, whenever the channel TIMEOUT error comes, all the channel instances which started together goes down, even for some of the channel instances the LSTMSGTI is current.

Your output shows the channel instances all coming from the same IP address. Already we have said that inactivity is not the trigger since you have heartbeats clearly flowing, and the pattern is sometimes not very long at all. Is it only ever connections from the shown IP address that are failing? If yes, what is it about this network compared to other networks that are not suffering?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
vicks_mq
PostPosted: Wed May 15, 2019 11:37 am    Post subject: Reply with quote

Disciple

Joined: 03 Oct 2017
Posts: 162

hughson wrote:
vicks_mq wrote:
I noticed one more behaviour, whenever the channel TIMEOUT error comes, all the channel instances which started together goes down, even for some of the channel instances the LSTMSGTI is current.

Your output shows the channel instances all coming from the same IP address. Already we have said that inactivity is not the trigger since you have heartbeats clearly flowing, and the pattern is sometimes not very long at all. Is it only ever connections from the shown IP address that are failing? If yes, what is it about this network compared to other networks that are not suffering?

Cheers,
Morag

Hi Morag, we migrated our applications to new data center few months back , but they were connected to old MQ and now we also migrated MQ to new Data center with MQ appliance box. and now we are having issue. The application run on 4 server(4 different IP address and all of them fails albeit at different time). they all use the same SVRCONN channel and connect to same QMGR/HOST.

so we have changed 2 things here - one is the network whihc has been changed and 2nd is the MQ whihc become now MQ appliance.
I just verified from the network firewall team and they said that Firewall timeout is same in both DC (1800 seconds) and they are not seeing any packets getting dropped.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Wed May 15, 2019 11:47 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9392
Location: US: west coast, almost. Otherwise, enroute.

vicks_mq wrote:
I just verified from the network firewall team and they said that Firewall timeout is same in both DC (1800 seconds) and they are not seeing any packets getting dropped.

Is that all the network folks looked at/for - dropped packets?

If I had a dollar for every time was lied to by the network team ...
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.


Last edited by bruce2359 on Wed May 15, 2019 2:21 pm; edited 1 time in total
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed May 15, 2019 11:51 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

vicks_mq wrote:
I just verified from the network firewall team and they said that Firewall timeout is same in both DC (1800 seconds) and they are not seeing any packets getting dropped.


Hah!!

I was once told some channels were retrying because of a "software bug" despite the channel logs being full of "connection refused" messages. Reversing the change they'd made the night before we started seeing the problem "in a network segment that's nothing to do with your traffic" miraculously fixed the software bug.......


_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed May 15, 2019 11:52 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

bruce2359 wrote:
Is that all the network folks looked at/for - dropped packets?




What about actual connection issues? Firewall logs? Everything else that can go wrong with a network?

bruce2359 wrote:
If I had a dollar for every time was lied to by the network team ...


.....you'd be on the beach next to the rest of us.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed May 15, 2019 12:26 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20695
Location: LI,NY

I had a recent problem with the network where I was unable to create a connection for a different product (not MQ). The network folks on both sides of the fence assured me there was no firewall stopping me and telnet worked perfectly fine to the port...

So I got them all together looking at my stuff while I was trying to connect...
15 mins later the connection was working and they still assured me they hadn't done a thing....
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
vicks_mq
PostPosted: Wed May 15, 2019 2:05 pm    Post subject: Reply with quote

Disciple

Joined: 03 Oct 2017
Posts: 162

Thank you all, I will engage network team to get more details. and will be back with the details. Hope they find something which can resolve the issue.
Back to top
View user's profile Send private message
vicks_mq
PostPosted: Thu May 16, 2019 4:47 am    Post subject: Reply with quote

Disciple

Joined: 03 Oct 2017
Posts: 162

hughson wrote:

Then it is not DISCINT that is dropping the connections. Sorry - I was under the impression that the connections were dropping "after a few hours" before.

This sounds very much like a firewall is dropping the connection after 40 minutes, although not due to inactivity since you know that heartbeat flows are going across. The timeout message you report is going to be the sender of the heartbeat waiting for the answer back from the heartbeat flow and not getting anything because the socket is no longer there. The 65 second timeout suggests this because other receive-wait (select) calls would use the negotiated heartbeat value (plus a bit) which you have told us is 300.

Cheers,
Morag
Hi Morag, I forgot to mention here that although the heartbeat flows are going across every 5 mins and I am seeing corresponding value of BYTSSENT increasing by 28 for all the instances of channels which are dropping but at the same time their LSTMSGTI has not changed for last 30-40 mins, I know HBINT and LSTMSGTI are not related but just want to mention that the instances of SVRCONN which are dropping are the one whose LSTMSGTI has not updated for last 30-40 mins.
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu May 16, 2019 4:55 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

fjb_saper wrote:
15 mins later the connection was working and they still assured me they hadn't done a thing....


Isn't it weird how network problems go away shortly after you manage to get the network people involved (often at gunpoint) yet not one of them has every done a contact admin thing to fix them?
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
hughson
PostPosted: Thu May 16, 2019 8:42 am    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

vicks_mq wrote:
I noticed one more behaviour, whenever the channel TIMEOUT error comes, all the channel instances which started together goes down, even for some of the channel instances the LSTMSGTI is current.


vicks_mq wrote:
Hi Morag, I forgot to mention here that although the heartbeat flows are going across every 5 mins and I am seeing corresponding value of BYTSSENT increasing by 28 for all the instances of channels which are dropping but at the same time their LSTMSGTI has not changed for last 30-40 mins, I know HBINT and LSTMSGTI are not related but just want to mention that the instances of SVRCONN which are dropping are the one whose LSTMSGTI has not updated for last 30-40 mins.


These two statements from you seem contradictory at first reading. Perhaps there is more information behind them? For example, are applications making more than one connection, but then when one connection that has not done an API call for 40 minutes (thus LSTMSGTI is 40 minutes ago) the connection is dropped and the the application then ends all the other connections it has made at the same time? Could that be the pattern of your applications? While I still think your networking team should be assisting in the diagnosis here, it would be interesting to understand more about the client end of the application rather than just the set of disparate SVRCONNs that are seen on the queue manager.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2, 3, 4  Next Page 3 of 4

MQSeries.net Forum Index » General IBM MQ Support » Why these channel instances are not dropping
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.