Author |
Message
|
MABeatty1978 |
Posted: Thu Oct 19, 2017 11:08 am Post subject: MQCONN takes 20 seconds on client |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
I have clients that are putting messages to clustered queues. On the server, things work just fine. But messages coming from a client are taking exactly 20 seconds to do an MQCONN. This happens most of the time. Every once in a while, the MQCONN does not succeeded and the application itself dies with no return code coming from the MQCONN.
I have this same software running in several locations, and this one client/server setup is the only place it is happening.
Any ideas would be appreciated. |
|
Back to top |
|
 |
PaulClarke |
Posted: Thu Oct 19, 2017 1:20 pm Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Any chance you have set this client up with a list of IP addresses in the connection name? 20 seconds is about the time it takes TCP to return a failed connect() to an unavailable IP address.
Cheers,
Paul. _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Oct 19, 2017 1:33 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
PaulClarke wrote: |
Any chance you have set this client up with a list of IP addresses in the connection name? 20 seconds is about the time it takes TCP to return a failed connect() to an unavailable IP address.
Cheers,
Paul. |
A client channel definition table (CCDT)? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PaulClarke |
Posted: Thu Oct 19, 2017 1:34 pm Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Sorry Bruce....I don't know what you mean by that. _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Oct 19, 2017 3:23 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
It was for the OP to give him/her a thing to look for. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PaulClarke |
Posted: Thu Oct 19, 2017 8:07 pm Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Ah OK, fair enough. I was confused. Since you quoted me I was assuming that the statement was directed at me somehow. _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
MABeatty1978 |
Posted: Fri Oct 20, 2017 4:41 am Post subject: |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
PaulClarke wrote: |
Any chance you have set this client up with a list of IP addresses in the connection name? 20 seconds is about the time it takes TCP to return a failed connect() to an unavailable IP address.
Cheers,
Paul. |
How would I check that? I didn't explicitly set anything in a CCDT
$MQSERVER = SYSTEM.CLNT.SVRCONN/TCP/primary(2414)
from /etc/hosts
10.50.120.201 primary.company.com primary
I've confirmed the server is 10.50.120.201
On the server:
3 : dis CHANNEL(SYSTEM.CLNT.SVRCONN)
AMQ8414: Display Channel details.
CHANNEL(SYSTEM.CLNT.SVRCONN) CHLTYPE(SVRCONN)
ALTDATE(2015-03-26) ALTTIME(23.50.29)
COMPHDR(NONE) COMPMSG(NONE)
DESCR(Client communication channel) HBINT(300)
KAINT(AUTO) MAXINST(999999999)
MAXINSTC(999999999) MAXMSGL(4194304)
MCAUSER( ) MONCHL(QMGR)
RCVDATA( ) RCVEXIT( )
SCYDATA( ) SCYEXIT( )
SENDDATA( ) SENDEXIT( )
SHARECNV(10) SSLCAUTH(REQUIRED)
SSLCIPH( ) SSLPEER( )
TRPTYPE(TCP) |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Oct 20, 2017 7:41 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
20 seconds for an MQCONN is not too uncommon. This is why there is connection pooling.
The pooling is implemented in JMS at the server level and in regular MQ there is a way to use connection pooling. Read up in the infocenter.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
PaulClarke |
Posted: Fri Oct 20, 2017 9:29 am Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Ok, so it doesn't look like a host name list is the issue.
I'm not sure I agree that 20 seconds for a connect is not uncommon. I would say it was pretty high. I guess the question is what are the two machines and where are they? If you use another mechanism to connect between the machines, like FTP or Telnet, what is the connect time? In other words, is the delay in the network, firewall etc or is it in MQ for some reason.
What actually is your client application? If you use something simple like AMQSPUTC do you still see a 20 second delay ?
Cheers,
Paul. _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
MABeatty1978 |
Posted: Fri Oct 20, 2017 9:40 am Post subject: |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
PaulClarke wrote: |
Ok, so it doesn't look like a host name list is the issue.
I'm not sure I agree that 20 seconds for a connect is not uncommon. I would say it was pretty high. I guess the question is what are the two machines and where are they? If you use another mechanism to connect between the machines, like FTP or Telnet, what is the connect time? In other words, is the delay in the network, firewall etc or is it in MQ for some reason.
What actually is your client application? If you use something simple like AMQSPUTC do you still see a 20 second delay ?
Cheers,
Paul. |
The 20 seconds a red flag for me because it happens every single time, 20 seconds, on the nose. I put some debug logging in my code, it writes to the log one line before the CONN and one line after the CONN and its always without exception 20 seconds:
Fri Oct 20 07:04:22 2017 - INFO:
Function: msg_send
Destination Queue: POS.BASDB.UP.Q
QMGR Name: PROD_POS1970_QM
Fri Oct 20 07:04:22 2017 - INFO:
MQCONN Started
Fri Oct 20 07:04:42 2017 - INFO:
MQCONN Ended
Fri Oct 20 07:04:42 2017 - INFO:
Message Sent:
Fri Oct 20 07:04:22 2017 - INFO:
Function: msg_send
Destination Queue: POS.BASDB.UP.Q
QMGR Name: PROD_POS1970_QM
Fri Oct 20 07:04:22 2017 - INFO:
MQCONN Started
Fri Oct 20 07:04:42 2017 - INFO:
MQCONN Ended
Fri Oct 20 07:04:42 2017 - INFO:
Message Sent:
I think its obviously a network issue... but I have no idea what to look at to fix it. |
|
Back to top |
|
 |
PaulClarke |
Posted: Fri Oct 20, 2017 9:49 am Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Do you see the same delay if you just do a simple PING or trace route? As I said before, do you see the same delay if you just use a simple AMQSPUTC ?
Essentially, I think you should try different environments to see if you can see the one that causes the delay.
If it is present in all then I would be looking at the firewall. If it is all MQ only then I would be looking at the MQ Server. If it was just your application then I would be looking at it...etc etc _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
MABeatty1978 |
Posted: Fri Oct 20, 2017 10:06 am Post subject: |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
PaulClarke wrote: |
Do you see the same delay if you just do a simple PING or trace route? As I said before, do you see the same delay if you just use a simple AMQSPUTC ?
Essentially, I think you should try different environments to see if you can see the one that causes the delay.
If it is present in all then I would be looking at the firewall. If it is all MQ only then I would be looking at the MQ Server. If it was just your application then I would be looking at it...etc etc |
Ping shows no dealy, telnet shows no delay, trace route shows no delay.
AMQSPUTC DOES have the same delay. So the server is suspect? |
|
Back to top |
|
 |
PaulClarke |
Posted: Fri Oct 20, 2017 10:32 am Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Well it certainly looks suspicious. However, we have at least narrowed it down to the MQ environment somehow. What I would do next is trace it.
Take an MQ Trace of the client. And look where the 20 seconds is being spent. It might be, for example, gethostbyname(). However, if it is the connect() call then I would trace the server end and see if I can see what part of MQ is taking 20 seconds.
The beauty of such a large delay is that it should show up easily. It is trying to find 20 milliseconds which is hard.
Cheers,
Paul. _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
MABeatty1978 |
Posted: Fri Oct 20, 2017 10:56 am Post subject: |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
PaulClarke wrote: |
Well it certainly looks suspicious. However, we have at least narrowed it down to the MQ environment somehow. What I would do next is trace it.
Take an MQ Trace of the client. And look where the 20 seconds is being spent. It might be, for example, gethostbyname(). However, if it is the connect() call then I would trace the server end and see if I can see what part of MQ is taking 20 seconds.
The beauty of such a large delay is that it should show up easily. It is trying to find 20 milliseconds which is hard.
Cheers,
Paul. |
Thank you, that was helpful. The trace made it easy to find where the problem is.... now to figure out WHAT the problem is. Any ideas, a quick Googling didn't return much.
14:48:20.353398 28788.1 RSESS:000001 getaddrinfo(): AF_INET & AI_NUMERICHOST: rc=-2 errno=2
*14:48:40.375810 28788.1 RSESS:000001 getaddrinfo(): AF_INET6: rc=-3 errno=110 |
|
Back to top |
|
 |
MABeatty1978 |
Posted: Fri Oct 20, 2017 11:19 am Post subject: |
|
|
Acolyte
Joined: 17 Jul 2014 Posts: 54
|
Thank you Paul for your help. I found the issue (still don't know the why).
14:48:20.353374 28788.1 RSESS:000001 Hostname: 'primary'
14:48:20.353376 28788.1 RSESS:000001 ------{ xcsIsEnvironment
14:48:20.353377 28788.1 RSESS:000001 xcsIsEnvironment[AMQ_NO_IPV6] = FALSE
14:48:20.353378 28788.1 RSESS:000001 ------} xcsIsEnvironment rc=OK
14:48:20.353398 28788.1 RSESS:000001 getaddrinfo(): AF_INET & AI_NUMERICHOST: rc=-2 errno=2
*14:48:40.375810 28788.1 RSESS:000001 getaddrinfo(): AF_INET6: rc=-3 errno=110
Host name "primary" is failing to resolve.
I was able to fix the issue (temporarily) by updating the MQSERVER
$MQSERVER = SYSTEM.CLNT.SVRCONN/TCP/primary(2414)
to
$MQSERVER = SYSTEM.CLNT.SVRCONN/TCP/10.50.120.201(2414)
by using the IP address, the MQCONN is almost immediate.
Now to figure out why it won't resolve....
Thanks again Paul. |
|
Back to top |
|
 |
|