|  | 
 
  
    | RSS Feed - WebSphere MQ Support | RSS Feed - Message Broker Support |  
 
  
	|    |  |  
  
	| MQClient - MQPUT hangs  - random occurance (c code) | « View previous topic :: View next topic » |  
  	| 
		
		
		  | Author | Message |  
		  | hthrane | 
			  
				|  Posted: Wed Feb 20, 2019 5:58 am    Post subject: MQClient - MQPUT hangs  - random occurance (c code) |   |  |  
		  | Newbie
 
 
 Joined: 20 Feb 2019Posts: 3
 
 
 | 
			  
				| MQclient version 8.0.0.9 - installed on xLinux, used by application code [c code] used to have communication to backend MQServer [zOS, MQServer version 8.0] 
 Client application
 - code build as non-SSL version originally, the non-SSL version has run with MQclient version 8.0.0.4 (I think it was)
 - now converted to use SSL - so shifted from MQCONN to MQCONNX
 - new MQCONNX version running on SAME xLinux box as the prevous MQCONN version did.
 - users uses the application on/off during normal workday [so in effect application is really only in use 10 hours per day , during that they there can be breaks everyting between 5 minutes and 3 hours where the application is not used - this has been the way the application has ben used before we converted to MQCONNX and is still the work pattern]
 - application runs linux service (aka running non stop)
 - application function
 start - perform MQCONNX, perform MQOPEN, MQPUT("start connect to MVS"), MQCLOSE, MQOPEN, MQGET("connected to MVS"), MQCLOSE
 application does based on user input
 MQOPEN
 MQPUT (msg)
 MQCLOSE
 MQOPEN
 MQGET(msg)
 MQCLOSE
 loop logic build in to support
 - MQ Return codes
 MQGET treats MQ 2033 (no msg as okay msg)
 All other NOT_OK msg. gets the application to do MQDISC/MQCMT and then restart connection to Backend (MQCONNX..)
 - MQ connection to backend is only used by 2 threads
 - Application uses unique SVRCONN  [application is to run on multiple servers but each server will have own SVRCONN to prevent conflicts]
 
 Issue  :
 MQPUT hangs in some cases for 5-15 minutes before the MQPUT call gets an MQ Return code.
 This does not happen in 90% of the cases [where the MQ return code on the MQPUT call is returned within 1-60 seconds , no matter if is is OK or NOT_OK return code]
 In the case where there is the 5-15 minutes delay/lay/hang on the MQPUT all it always ends with NOT_OK
 MQPUT "hang" always happens after some time of inactivity  , so far it looks like that if the z/OS backend has identified the connection has timeout[but the linux-application has not] then MQPUT from the linux-application quickly report MQ NOT_OK , but if both the z/OS and the Linux application believes that the connection is there then MQPUT can hang
 >> after 5-15 minutes when the application has finally gotten the MQ retun code NOT_OK then the application does MQDISC and re-establish the connection to the z/OS MQ mgr and then all MQPUT, MQGET etc...works without issues for hours ]
 Info on SVRCONN on the z/OS backend
 - if it times out it reports : +CSQX259E :ECP2 CSQXRESP Connection timed out...
 - it has HBINT 300 , KAINT=AUTO , Shared conversations= 10
 >> z/OS started to recognizes timeout after we started to use MQCONNX - before when using MQCONN then timeout was once per month [or less], after we have switched to MQCONNX then we see timeout daily , but with random pattern - sometimes after 2 hours, sometimes after 1 hour. MQ on z/OS has not been upgraded during our switch form MQCONN to MQCONNX. SVRCONN defs on z/OS side has only been updated to have SSL cert info [HBINT, KAINT, SharedConverstation as they were in the past]
 
 mqclient.ini on xLinux has
 TCP:
 KeepAlive = Yes
 Connect_Timeout=60
 (orignally we did not have those TCP parms in the mqclient.ini, but we noticed that after we put them in then it seemed that mqput "hang" did not occur as often as before)
 
 The application on the linux side does not define any of the MQCNO_RECONN* parms as we do not want automatick mq-reconnect. We want our application code to take the decision on the MQ reconnect.
 
 Questions
 1. Should mqclient.ini contain the "DefRecon=DISABLED" ?
 > Based on documentation then MQCNO_RECONN* default settings in any MQCNO_Versions 9 and above is that automatic mq reconnect is not active , unless you define your code to use the some of the MQCNO_RECONN* parms.
 2.Some documentation indicates that SVRCONN from a client side can be impacted by "MQReconnectTimeOut" parm in mqclient.ini but I have not been able to find full documentation on "MQReconnectTimeOut" . Can you advice where to find info ?
 3. As we can see that MQPUT never hangs if z/OS has had timeout the connection then any recommendation to force more often timeout from the MVS is most welcome.
 
 Best Regards  - and in advance thanks for any input/advice on this
 Helle
 |  |  
		  | Back to top |  |  
		  |  |  
		  | hughson | 
			  
				|  Posted: Wed Feb 20, 2019 4:33 pm    Post subject: |   |  |  
		  |  Padawan
 
 
 Joined: 09 May 2013Posts: 1967
 Location: Bay of Plenty, New Zealand
 
 | 
			  
				| I'm wondering whether the HBINT value in use has changed given that you are now supplying the MQCD whereas previously you were getting a set of values deemed as appropriate for the MQSERVER environment variable. 
 When the connection is live you can issue DISPLAY CHSTATUS to see the negotiated value of HBINT to see if it is as you expect (for example not zero).
 
 The channel will need some data flowing over it to keep it open from a firewall perspective. Heart beats achieve that but only if negotiated HBINT is not zero. KAINT on the SVRCONN will only be applicable if TCPKEEP(YES) is set on z/OS queue manager.
 
 If you're not specifying the MQCNO_RECONNECT options then you are not using client reconnect.
 
 Cheers,
 Morag
 _________________
 Morag Hughson @MoragHughson
 IBM MQ Technical Education Specialist
 Get your IBM MQ training here!
 MQGem Software
 |  |  
		  | Back to top |  |  
		  |  |  
		  | gbaddeley | 
			  
				|  Posted: Wed Feb 20, 2019 4:42 pm    Post subject: |   |  |  
		  |  Jedi Knight
 
 
 Joined: 25 Mar 2003Posts: 2538
 Location: Melbourne, Australia
 
 | 
			  
				| Try running MQ trace on the client xLinux. It will show the timing and what is happening inside MQ. _________________
 Glenn
 |  |  
		  | Back to top |  |  
		  |  |  
		  | hthrane | 
			  
				|  Posted: Thu Feb 21, 2019 8:32 am    Post subject: |   |  |  
		  | Newbie
 
 
 Joined: 20 Feb 2019Posts: 3
 
 
 | 
			  
				| 1) Our z/OS runs with TCKKEEP(YES).
 I still have not figured out how to control HBINT from a client point of view - so if some link to some good documentation on that subject could be shared then I would appreciate it.
 The DISPLAY CHSTATUS  - I manage to get it done at the same time we had the issue (or just after) such that I could see that the channel instance really reflected. A couple of surprises
 HBINT = 0
 but also that that the connection stops working then we reach
 Transmissions - Sent  . . . : 500           Received  . . : 500
 > HBINT = 0 can explain the unexpected timeout  . I will follow up on that.
 > that the connections starts the "lag" when transmissions qty reaches 500/500 is interesting in my mind. I will see if I can find any parms that controls max transmissions in active MQCONNX connection or if it somewhat a SSL transmission limitation.
 
 Thanks for the confirm that if I don't use MQCNO_RECONNECT  options then client does not use reconnect.
 
 2)
 Have requested an MQ trace on the client xLinux, we will run trace for 24 hours to be sure we have hopefully more than one error situation to look at .
 I am still surprised that /var/mqm/errors/AMQERR0x.LOG files does not contain other than AMQ9209 when we have this problem .
 |  |  
		  | Back to top |  |  
		  |  |  
		  | hughson | 
			  
				|  Posted: Thu Feb 21, 2019 10:27 am    Post subject: |   |  |  
		  |  Padawan
 
 
 Joined: 09 May 2013Posts: 1967
 Location: Bay of Plenty, New Zealand
 
 | 
			  
				| 
   
	| hthrane wrote: |  
	| I still have not figured out how to control HBINT from a client point of view - so if some link to some good documentation on that subject could be shared then I would appreciate it. The DISPLAY CHSTATUS  - I manage to get it done at the same time we had the issue (or just after) such that I could see that the channel instance really reflected. A couple of surprises
 HBINT = 0
 |  I got the impression that your use of MQCONNX was because you were filling in channel details on the MQCD structure that is part of this API call. Is that the case? If so, then the heartbeat interval field is just another field on the MQCD that you can fill in. If you supply a snippet of your code we can advise further.
 
 Also, what cipher spec are you using?
 
 Cheers,
 Morag
 _________________
 Morag Hughson @MoragHughson
 IBM MQ Technical Education Specialist
 Get your IBM MQ training here!
 MQGem Software
 |  |  
		  | Back to top |  |  
		  |  |  
		  | hthrane | 
			  
				|  Posted: Fri Feb 22, 2019 8:24 am    Post subject: MQ Trace gave a pointer to what is wrong |   |  |  
		  | Newbie
 
 
 Joined: 20 Feb 2019Posts: 3
 
 
 | 
			  
				| MQ trace  for the time 10.34 where the problem started and until 10.50 when the application "hanged" in the MQPUT action +10:34:02.936796    13283.1      RSESS:00001e      Sending 533 bytes
 +10:34:02.936796    13283.1      RSESS:00001e        0x0000:  17030302 106b70c5 45ab1794 14a22931  |.....kp.E.....)1|
 10:34:02.936796    13283.1      RSESS:00001e        0x0010:  454a8031 1a0a8d14 aef87a9c 8c8c8dbf  |EJ.1. ....z.....|
 .... (the lines representing the sending of 533 bytes - most of them removed to keep this snip short)
 10:34:02.936796    13283.1      RSESS:00001e        0x01f0:  aa608183 2cea36c3 77cb9e9f 55927e83  |.`..,.6.w...U.~.|
 10:34:02.936796    13283.1      RSESS:00001e        0x0200:  d7ac6c3a 43adca6a 9ab0c506 df010089  |..l:C..j........|
 10:34:02.936796    13283.1      RSESS:00001e        0x0210:  b6464ab1 80                          |.FJ..           |
 +10:34:02.936799    13283.1      RSESS:00001e      RetCode (OK)
 +10:34:02.936801    13283.1      RSESS:00001e -------}  cciTcpSend rc=OK FunctionTime=84
 +10:34:02.936803    13283.1      RSESS:00001e ------}! cciTcpSslWriteCallback rc=Unknown(215) FunctionTime=101
 +10:34:02.936912    13283.1      RSESS:00001e      gsk_secure_soc_write: output: gsk_soc_handle=0x0x7f41e80bd9d0, pBytesToWrite=0x0x7f41e80d89b0, nBytesToWrite=476, pnBytesWritten=0x0x7f41ed87f538, nBytesWritten=476
 +10:34:02.936919    13283.1      RSESS:00001e -----}  ccigsk_secure_soc_write rc=OK FunctionTime=1127
 +10:34:02.936922    13283.1      RSESS:00001e ----}  cciSslSecureSend rc=OK FunctionTime=1166
 +10:34:02.936924    13283.1      RSESS:00001e ---}  ccxSend rc=OK FunctionTime=1173
 +10:34:02.936928    13283.1      RSESS:00001e      *pointer(0x7f41e80d89b0)
 +10:34:02.936931    13283.1      RSESS:00001e ---{  ccxFreeMem
 +10:34:02.936933    13283.1      RSESS:00001e ---}  ccxFreeMem rc=OK FunctionTime=2
 +10:34:02.936936    13283.1      RSESS:00001e ---{  ccxReceive
 +10:34:02.936939    13283.1      RSESS:00001e ----{  cciSslSecureReceive
 +10:34:02.936941    13283.1      RSESS:00001e -----{  cciSslGSKitTrace
 +10:34:02.936943    13283.1      RSESS:00001e -----}  cciSslGSKitTrace rc=OK FunctionTime=2
 +10:34:02.936945    13283.1      RSESS:00001e -----{  ccxAllocMem
 +10:34:02.936949    13283.1      RSESS:00001e      Receive buffer allocated
 +10:34:02.936951    13283.1      RSESS:00001e      length(16284) *pointer(0x7f41e80d89b0)
 +10:34:02.936954    13283.1      RSESS:00001e -----}  ccxAllocMem rc=OK FunctionTime=9
 +10:34:02.936961    13283.1      RSESS:00001e -----{  ccigsk_secure_soc_read
 +10:34:02.936965    13283.1      RSESS:00001e      gsk_secure_soc_read: input: gsk_soc_handle=0x0x7f41e80bd9d0, pBufferForData=0x0x7f41e80d89b0, nMaxBytes=16284, pnBytesReceived=0x0x7f41ed87f538
 +10:34:02.937061    13283.1      RSESS:00001e ------{  cciTcpSslReadCallback
 +10:34:02.937071    13283.1      RSESS:00001e      cciTcpSslReadCallback: input: nBytesToRead=5
 +10:34:02.937074    13283.1      RSESS:00001e -------{  cciTcpReceive
 +10:34:02.937077    13283.1      RSESS:00001e      Issuing recv 5 @ 0x7f41e8127fe8 (0x7f41e80d89b0 325176)
 +10:34:02.937080    13283.1      RSESS:00001e --------{  recv
 +10:34:02.937088    13283.1      RSESS:00001e --------}! recv rc=Unknown(B) FunctionTime=8
 +10:34:02.937091    13283.1      RSESS:00001e      Data: 0xffffffff
 +10:34:02.937094    13283.1      RSESS:00001e      return recv -1
 +10:34:02.937097    13283.1      RSESS:00001e      RetCode (OK)
 +10:34:02.937099    13283.1      RSESS:00001e -------}! cciTcpReceive rc=rrcI_SSL_WOULD_BLOCK FunctionTime=25
 +10:34:02.937101    13283.1      RSESS:00001e ------}! cciTcpSslReadCallback rc=Unknown(FFFFFFFF) FunctionTime=40
 +10:34:02.937167    13283.1      RSESS:00001e      gsk_secure_soc_read: output: gsk_soc_handle=0x0x7f41e80bd9d0, pBufferForData=0x0x7f41e80d89b0, nMaxBytes=16284, pnBytesReceived=0x7f41ed87f538, nBytesReceived=0
 +10:34:02.937171    13283.1      RSESS:00001e -----}! ccigsk_secure_soc_read rc=Unknown(226) FunctionTime=210
 +10:34:02.937174    13283.1      RSESS:00001e -----{  cciTcpWaitReceive
 +10:34:02.937176    13283.1      RSESS:00001e ------{  xcsWaitFd
 +10:34:02.937180    13283.1      RSESS:00001e      fdcount=1, fd[0]=31 option=1 timeout=(0sec, 0usec)
 +10:34:02.937188    13283.1      RSESS:00001e -------{  poll
 +10:34:02.937192    13283.1      RSESS:00001e      timeout=20000ms
 +10:34:22.939909    13283.1      RSESS:00001e -------}  poll rc=OK FunctionTime=20002721
 +10:34:22.939954    13283.1      RSESS:00001e -------{  poll
 +10:34:22.939966    13283.1      RSESS:00001e      timeout=20000ms
 +10:34:42.960009    13283.1      RSESS:00001e -------}  poll rc=OK FunctionTime=20020055
 ......does poll for a long time until
 +10:50:43.680961    13283.1      RSESS:00001e -------{  poll
 +10:50:43.680975    13283.1      RSESS:00001e      timeout=20000ms
 +10:50:54.054127    13283.1      RSESS:00001e -------}! poll rc=Unknown(1) FunctionTime=10373166
 +10:50:54.054178    13283.1      RSESS:00001e ------}! xcsWaitFd rc=Unknown(1) FunctionTime=1011117002
 +10:50:54.054186    13283.1      RSESS:00001e -----}  cciTcpWaitReceive rc=OK FunctionTime=1011117012
 +10:50:54.054223    13283.1      RSESS:00001e      Retrying gsk_secure_soc_read...
 +10:50:54.054237    13283.1      RSESS:00001e -----{  ccigsk_secure_soc_read
 +10:50:54.054260    13283.1      RSESS:00001e      gsk_secure_soc_read: input: gsk_soc_handle=0x0x7f41e80bd9d0, pBufferForData=0x0x7f41e80d89b0, nMaxBytes=16284, pnBytesReceived=0x0x7f41ed87f538
 +10:50:54.054579    13283.1      RSESS:00001e ------{  cciTcpSslReadCallback
 +10:50:54.054594    13283.1      RSESS:00001e      cciTcpSslReadCallback: input: nBytesToRead=5
 +10:50:54.054617    13283.1      RSESS:00001e -------{  cciTcpReceive
 +10:50:54.054640    13283.1      RSESS:00001e      Issuing recv 5 @ 0x7f41e8127fe8 (0x7f41e80d89b0 325176)
 +10:50:54.054646    13283.1      RSESS:00001e --------{  recv
 +10:50:54.054663    13283.1      RSESS:00001e --------}! recv rc=Unknown(6E) FunctionTime=17
 +10:50:54.054672    13283.1      RSESS:00001e      Data: 0xffffffff
 +10:50:54.054678    13283.1      RSESS:00001e      return recv -1
 +10:50:54.054685    13283.1      RSESS:00001e      Issuing recv 5 @ 0x7f41e8127fe8 (0x7f41e80d89b0 325176)
 +10:50:54.054690    13283.1      RSESS:00001e --------{  recv
 +10:50:54.054697    13283.1      RSESS:00001e --------}  recv rc=OK FunctionTime=7
 +10:50:54.054703    13283.1      RSESS:00001e      Data: 0x00000000
 +10:50:54.054709    13283.1      RSESS:00001e      return recv 0
 +10:50:54.054724    13283.1      RSESS:00001e --------{  cciTcpGetNameandAddress
 +10:50:54.054735    13283.1      RSESS:00001e      Reverse DNS lookups: TRUE (will resolve)
 +10:50:54.054743    13283.1      RSESS:00001e      Initial HostName: 'A.B.C.D' Port '(XXXX)'
 +10:50:54.054749    13283.1      RSESS:00001e      Initial pConv->PeerName 'A.B.C.D(XXXX)'
 +10:50:54.054755    13283.1      RSESS:00001e      Initial pConv->PeerHostNameCount 0
 +10:50:54.054773    13283.1      RSESS:00001e ---------{  cciTcpResolveHostname
 +10:50:54.054784    13283.1      RSESS:00001e      Hostname: 'A.B.C.D'
 +10:50:54.054790    13283.1      RSESS:00001e ----------{  xcsIsEnvironment
 +10:50:54.054798    13283.1      RSESS:00001e      xcsIsEnvironment[AMQ_NO_IPV6] = FALSE
 +10:50:54.054826    13283.1      RSESS:00001e ----------}  xcsIsEnvironment rc=OK FunctionTime=36
 +10:50:54.054849    13283.1      RSESS:00001e      serv_addr4
 +10:50:54.054856    13283.1      RSESS:00001e        0x0000:  02000000 09a29a25 00000000 00000000  |.... ..%........|
 +10:50:54.054861    13283.1      RSESS:00001e      serv_addr6
 +10:50:54.054865    13283.1      RSESS:00001e        0x0000:  00000000 00000000 00000000 00000000  |................|
 10:50:54.054865    13283.1      RSESS:00001e        0x0010:  00000000 00000000 00000000           |............    |
 +10:50:54.054870    13283.1      RSESS:00001e ---------}  cciTcpResolveHostname rc=OK FunctionTime=97
 +10:50:54.054878    13283.1      RSESS:00001e ---------{  cciTcpResolveAddress
 +10:50:54.054884    13283.1      RSESS:00001e      serv_addr4
 +10:50:54.054889    13283.1      RSESS:00001e        0x0000:  02000000 09a29a25 00000000 00000000  |.... ..%........|
 +10:50:54.054895    13283.1      RSESS:00001e      Reverse DNS lookups are enabled
 +10:50:54.056600    13283.1      RSESS:00001e      AddrName: 'A.B.C.D'
 +10:50:54.056610    13283.1      RSESS:00001e      Hostname: '<real hostname'
 +10:50:54.056613    13283.1      RSESS:00001e ---------}  cciTcpResolveAddress rc=OK FunctionTime=1735
 +10:50:54.056617    13283.1      RSESS:00001e      Resolved name: mvsv (A.B.C.D)(XXXX)
 +10:50:54.056621    13283.1      RSESS:00001e ---------{  xcsGetMemFn
 +10:50:54.056632    13283.1      RSESS:00001e      component:8 function:55 length:8 options:0 cbmindex:-1 *pointer:0x7f41e813fae0
 +10:50:54.056636    13283.1      RSESS:00001e ---------}  xcsGetMemFn rc=OK FunctionTime=15
 +10:50:54.056639    13283.1      RSESS:00001e --------}  cciTcpGetNameandAddress rc=OK FunctionTime=1915
 +10:50:54.056647    13283.1      RSESS:00001e --------{  rrxError
 +10:50:54.056653    13283.1      RSESS:00001e      RetCode = 20009209, rc1 = 0, rc2 = 0, Comment1 = '<xxxx(hostname/ip address)>', Comment2 = 'TCP/IP', Comment3= '<queue name>', File= '/build/slot1/p800_P/src/lib/comms/amqccita.c', Line= '4157'
 
 >> as the connection came up at 10:50 then it is more the condition that lead to the "polling" that really is the issue , which I read as not really a problem with MQ but more tcp/ip / gsk .
 |  |  
		  | Back to top |  |  
		  |  |  
		  | mvic | 
			  
				|  Posted: Fri Feb 22, 2019 10:42 am    Post subject: Re: MQ Trace gave a pointer to what is wrong |   |  |  
		  |  Jedi
 
 
 Joined: 09 Mar 2004Posts: 2080
 
 
 | 
			  
				| 
   
	| hthrane wrote: |  
	| as the connection came up at 10:50 then it is more the condition that lead to the "polling" that really is the issue , which I read as not really a problem with MQ but more tcp/ip / gsk . |  You're about to get an AMQ9209 error message, based on this trace.
 Your connection has timed out, waiting for bytes from the other side.
 Next step would be to get a trace from the other side, at the same time, and see what was taking so long.  The problem certainly isn't on the local side.
 Are your routers and MTUs all set up correctly?
 This seems too weird to be an issue within MQ (or GSKit for that matter).
 |  |  
		  | Back to top |  |  
		  |  |  
		  | hughson | 
			  
				|  Posted: Fri Feb 22, 2019 2:11 pm    Post subject: |   |  |  
		  |  Padawan
 
 
 Joined: 09 May 2013Posts: 1967
 Location: Bay of Plenty, New Zealand
 
 | 
			  
				| I thought we had already determined you need to set MQCD.HeartbeatInterval on the client? Or alternatively use MQCD_CLIENT_CONN_DEFAULT to initialise the MQCD. What language is your client application written in btw? 
 Cheers,
 Morag
 _________________
 Morag Hughson @MoragHughson
 IBM MQ Technical Education Specialist
 Get your IBM MQ training here!
 MQGem Software
 |  |  
		  | Back to top |  |  
		  |  |  
		  |  |  |  
  
	|    |  | Page 1 of 1 |  
 
 
  
  	| 
		
		  | 
 
 | You cannot post new topics in this forum You cannot reply to topics in this forum
 You cannot edit your posts in this forum
 You cannot delete your posts in this forum
 You cannot vote in polls in this forum
 
 |  |  |  |