ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexIBM MQ API SupportMQClient - MQPUT hangs - random occurance (c code)

Post new topicReply to topic
MQClient - MQPUT hangs - random occurance (c code) View previous topic :: View next topic
Author Message
hthrane
PostPosted: Wed Feb 20, 2019 5:58 am Post subject: MQClient - MQPUT hangs - random occurance (c code) Reply with quote

Newbie

Joined: 20 Feb 2019
Posts: 3

MQclient version 8.0.0.9 - installed on xLinux, used by application code [c code] used to have communication to backend MQServer [zOS, MQServer version 8.0]

Client application
- code build as non-SSL version originally, the non-SSL version has run with MQclient version 8.0.0.4 (I think it was)
- now converted to use SSL - so shifted from MQCONN to MQCONNX
- new MQCONNX version running on SAME xLinux box as the prevous MQCONN version did.
- users uses the application on/off during normal workday [so in effect application is really only in use 10 hours per day , during that they there can be breaks everyting between 5 minutes and 3 hours where the application is not used - this has been the way the application has ben used before we converted to MQCONNX and is still the work pattern]
- application runs linux service (aka running non stop)
- application function
start - perform MQCONNX, perform MQOPEN, MQPUT("start connect to MVS"), MQCLOSE, MQOPEN, MQGET("connected to MVS"), MQCLOSE
application does based on user input
MQOPEN
MQPUT (msg)
MQCLOSE
MQOPEN
MQGET(msg)
MQCLOSE
loop logic build in to support
- MQ Return codes
MQGET treats MQ 2033 (no msg as okay msg)
All other NOT_OK msg. gets the application to do MQDISC/MQCMT and then restart connection to Backend (MQCONNX..)
- MQ connection to backend is only used by 2 threads
- Application uses unique SVRCONN [application is to run on multiple servers but each server will have own SVRCONN to prevent conflicts]

Issue :
MQPUT hangs in some cases for 5-15 minutes before the MQPUT call gets an MQ Return code.
This does not happen in 90% of the cases [where the MQ return code on the MQPUT call is returned within 1-60 seconds , no matter if is is OK or NOT_OK return code]
In the case where there is the 5-15 minutes delay/lay/hang on the MQPUT all it always ends with NOT_OK
MQPUT "hang" always happens after some time of inactivity , so far it looks like that if the z/OS backend has identified the connection has timeout[but the linux-application has not] then MQPUT from the linux-application quickly report MQ NOT_OK , but if both the z/OS and the Linux application believes that the connection is there then MQPUT can hang
>> after 5-15 minutes when the application has finally gotten the MQ retun code NOT_OK then the application does MQDISC and re-establish the connection to the z/OS MQ mgr and then all MQPUT, MQGET etc...works without issues for hours ]
Info on SVRCONN on the z/OS backend
- if it times out it reports : +CSQX259E :ECP2 CSQXRESP Connection timed out...
- it has HBINT 300 , KAINT=AUTO , Shared conversations= 10
>> z/OS started to recognizes timeout after we started to use MQCONNX - before when using MQCONN then timeout was once per month [or less], after we have switched to MQCONNX then we see timeout daily , but with random pattern - sometimes after 2 hours, sometimes after 1 hour. MQ on z/OS has not been upgraded during our switch form MQCONN to MQCONNX. SVRCONN defs on z/OS side has only been updated to have SSL cert info [HBINT, KAINT, SharedConverstation as they were in the past]

mqclient.ini on xLinux has
TCP:
KeepAlive = Yes
Connect_Timeout=60
(orignally we did not have those TCP parms in the mqclient.ini, but we noticed that after we put them in then it seemed that mqput "hang" did not occur as often as before)

The application on the linux side does not define any of the MQCNO_RECONN* parms as we do not want automatick mq-reconnect. We want our application code to take the decision on the MQ reconnect.

Questions
1. Should mqclient.ini contain the "DefRecon=DISABLED" ?
> Based on documentation then MQCNO_RECONN* default settings in any MQCNO_Versions 9 and above is that automatic mq reconnect is not active , unless you define your code to use the some of the MQCNO_RECONN* parms.
2.Some documentation indicates that SVRCONN from a client side can be impacted by "MQReconnectTimeOut" parm in mqclient.ini but I have not been able to find full documentation on "MQReconnectTimeOut" . Can you advice where to find info ?
3. As we can see that MQPUT never hangs if z/OS has had timeout the connection then any recommendation to force more often timeout from the MVS is most welcome.

Best Regards - and in advance thanks for any input/advice on this
Helle
Back to top
View user's profile Send private message
hughson
PostPosted: Wed Feb 20, 2019 4:33 pm Post subject: Reply with quote

Grand Master

Joined: 09 May 2013
Posts: 1253
Location: Bay of Plenty, New Zealand

I'm wondering whether the HBINT value in use has changed given that you are now supplying the MQCD whereas previously you were getting a set of values deemed as appropriate for the MQSERVER environment variable.

When the connection is live you can issue DISPLAY CHSTATUS to see the negotiated value of HBINT to see if it is as you expect (for example not zero).

The channel will need some data flowing over it to keep it open from a firewall perspective. Heart beats achieve that but only if negotiated HBINT is not zero. KAINT on the SVRCONN will only be applicable if TCPKEEP(YES) is set on z/OS queue manager.

If you're not specifying the MQCNO_RECONNECT options then you are not using client reconnect.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
gbaddeley
PostPosted: Wed Feb 20, 2019 4:42 pm Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2017
Location: Melbourne, Australia

Try running MQ trace on the client xLinux. It will show the timing and what is happening inside MQ.
_________________
Glenn
Back to top
View user's profile Send private message
hthrane
PostPosted: Thu Feb 21, 2019 8:32 am Post subject: Reply with quote

Newbie

Joined: 20 Feb 2019
Posts: 3

1)
Our z/OS runs with TCKKEEP(YES).
I still have not figured out how to control HBINT from a client point of view - so if some link to some good documentation on that subject could be shared then I would appreciate it.
The DISPLAY CHSTATUS - I manage to get it done at the same time we had the issue (or just after) such that I could see that the channel instance really reflected. A couple of surprises
HBINT = 0
but also that that the connection stops working then we reach
Transmissions - Sent . . . : 500 Received . . : 500
> HBINT = 0 can explain the unexpected timeout . I will follow up on that.
> that the connections starts the "lag" when transmissions qty reaches 500/500 is interesting in my mind. I will see if I can find any parms that controls max transmissions in active MQCONNX connection or if it somewhat a SSL transmission limitation.

Thanks for the confirm that if I don't use MQCNO_RECONNECT options then client does not use reconnect.

2)
Have requested an MQ trace on the client xLinux, we will run trace for 24 hours to be sure we have hopefully more than one error situation to look at .
I am still surprised that /var/mqm/errors/AMQERR0x.LOG files does not contain other than AMQ9209 when we have this problem .
Back to top
View user's profile Send private message
hughson
PostPosted: Thu Feb 21, 2019 10:27 am Post subject: Reply with quote

Grand Master

Joined: 09 May 2013
Posts: 1253
Location: Bay of Plenty, New Zealand

hthrane wrote:
I still have not figured out how to control HBINT from a client point of view - so if some link to some good documentation on that subject could be shared then I would appreciate it.
The DISPLAY CHSTATUS - I manage to get it done at the same time we had the issue (or just after) such that I could see that the channel instance really reflected. A couple of surprises
HBINT = 0

I got the impression that your use of MQCONNX was because you were filling in channel details on the MQCD structure that is part of this API call. Is that the case? If so, then the heartbeat interval field is just another field on the MQCD that you can fill in. If you supply a snippet of your code we can advise further.

Also, what cipher spec are you using?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
hthrane
PostPosted: Fri Feb 22, 2019 8:24 am Post subject: MQ Trace gave a pointer to what is wrong Reply with quote

Newbie

Joined: 20 Feb 2019
Posts: 3

MQ trace for the time 10.34 where the problem started and until 10.50 when the application "hanged" in the MQPUT action
+10:34:02.936796 13283.1 RSESS:00001e Sending 533 bytes
+10:34:02.936796 13283.1 RSESS:00001e 0x0000: 17030302 106b70c5 45ab1794 14a22931 |.....kp.E.....)1|
10:34:02.936796 13283.1 RSESS:00001e 0x0010: 454a8031 1a0a8d14 aef87a9c 8c8c8dbf |EJ.1. ....z.....|
.... (the lines representing the sending of 533 bytes - most of them removed to keep this snip short)
10:34:02.936796 13283.1 RSESS:00001e 0x01f0: aa608183 2cea36c3 77cb9e9f 55927e83 |.`..,.6.w...U.~.|
10:34:02.936796 13283.1 RSESS:00001e 0x0200: d7ac6c3a 43adca6a 9ab0c506 df010089 |..l:C..j........|
10:34:02.936796 13283.1 RSESS:00001e 0x0210: b6464ab1 80 |.FJ.. |
+10:34:02.936799 13283.1 RSESS:00001e RetCode (OK)
+10:34:02.936801 13283.1 RSESS:00001e -------} cciTcpSend rc=OK FunctionTime=84
+10:34:02.936803 13283.1 RSESS:00001e ------}! cciTcpSslWriteCallback rc=Unknown(215) FunctionTime=101
+10:34:02.936912 13283.1 RSESS:00001e gsk_secure_soc_write: output: gsk_soc_handle=0x0x7f41e80bd9d0, pBytesToWrite=0x0x7f41e80d89b0, nBytesToWrite=476, pnBytesWritten=0x0x7f41ed87f538, nBytesWritten=476
+10:34:02.936919 13283.1 RSESS:00001e -----} ccigsk_secure_soc_write rc=OK FunctionTime=1127
+10:34:02.936922 13283.1 RSESS:00001e ----} cciSslSecureSend rc=OK FunctionTime=1166
+10:34:02.936924 13283.1 RSESS:00001e ---} ccxSend rc=OK FunctionTime=1173
+10:34:02.936928 13283.1 RSESS:00001e *pointer(0x7f41e80d89b0)
+10:34:02.936931 13283.1 RSESS:00001e ---{ ccxFreeMem
+10:34:02.936933 13283.1 RSESS:00001e ---} ccxFreeMem rc=OK FunctionTime=2
+10:34:02.936936 13283.1 RSESS:00001e ---{ ccxReceive
+10:34:02.936939 13283.1 RSESS:00001e ----{ cciSslSecureReceive
+10:34:02.936941 13283.1 RSESS:00001e -----{ cciSslGSKitTrace
+10:34:02.936943 13283.1 RSESS:00001e -----} cciSslGSKitTrace rc=OK FunctionTime=2
+10:34:02.936945 13283.1 RSESS:00001e -----{ ccxAllocMem
+10:34:02.936949 13283.1 RSESS:00001e Receive buffer allocated
+10:34:02.936951 13283.1 RSESS:00001e length(16284) *pointer(0x7f41e80d89b0)
+10:34:02.936954 13283.1 RSESS:00001e -----} ccxAllocMem rc=OK FunctionTime=9
+10:34:02.936961 13283.1 RSESS:00001e -----{ ccigsk_secure_soc_read
+10:34:02.936965 13283.1 RSESS:00001e gsk_secure_soc_read: input: gsk_soc_handle=0x0x7f41e80bd9d0, pBufferForData=0x0x7f41e80d89b0, nMaxBytes=16284, pnBytesReceived=0x0x7f41ed87f538
+10:34:02.937061 13283.1 RSESS:00001e ------{ cciTcpSslReadCallback
+10:34:02.937071 13283.1 RSESS:00001e cciTcpSslReadCallback: input: nBytesToRead=5
+10:34:02.937074 13283.1 RSESS:00001e -------{ cciTcpReceive
+10:34:02.937077 13283.1 RSESS:00001e Issuing recv 5 @ 0x7f41e8127fe8 (0x7f41e80d89b0 325176)
+10:34:02.937080 13283.1 RSESS:00001e --------{ recv
+10:34:02.937088 13283.1 RSESS:00001e --------}! recv rc=Unknown(B) FunctionTime=8
+10:34:02.937091 13283.1 RSESS:00001e Data: 0xffffffff
+10:34:02.937094 13283.1 RSESS:00001e return recv -1
+10:34:02.937097 13283.1 RSESS:00001e RetCode (OK)
+10:34:02.937099 13283.1 RSESS:00001e -------}! cciTcpReceive rc=rrcI_SSL_WOULD_BLOCK FunctionTime=25
+10:34:02.937101 13283.1 RSESS:00001e ------}! cciTcpSslReadCallback rc=Unknown(FFFFFFFF) FunctionTime=40
+10:34:02.937167 13283.1 RSESS:00001e gsk_secure_soc_read: output: gsk_soc_handle=0x0x7f41e80bd9d0, pBufferForData=0x0x7f41e80d89b0, nMaxBytes=16284, pnBytesReceived=0x7f41ed87f538, nBytesReceived=0
+10:34:02.937171 13283.1 RSESS:00001e -----}! ccigsk_secure_soc_read rc=Unknown(226) FunctionTime=210
+10:34:02.937174 13283.1 RSESS:00001e -----{ cciTcpWaitReceive
+10:34:02.937176 13283.1 RSESS:00001e ------{ xcsWaitFd
+10:34:02.937180 13283.1 RSESS:00001e fdcount=1, fd[0]=31 option=1 timeout=(0sec, 0usec)
+10:34:02.937188 13283.1 RSESS:00001e -------{ poll
+10:34:02.937192 13283.1 RSESS:00001e timeout=20000ms
+10:34:22.939909 13283.1 RSESS:00001e -------} poll rc=OK FunctionTime=20002721
+10:34:22.939954 13283.1 RSESS:00001e -------{ poll
+10:34:22.939966 13283.1 RSESS:00001e timeout=20000ms
+10:34:42.960009 13283.1 RSESS:00001e -------} poll rc=OK FunctionTime=20020055
......does poll for a long time until
+10:50:43.680961 13283.1 RSESS:00001e -------{ poll
+10:50:43.680975 13283.1 RSESS:00001e timeout=20000ms
+10:50:54.054127 13283.1 RSESS:00001e -------}! poll rc=Unknown(1) FunctionTime=10373166
+10:50:54.054178 13283.1 RSESS:00001e ------}! xcsWaitFd rc=Unknown(1) FunctionTime=1011117002
+10:50:54.054186 13283.1 RSESS:00001e -----} cciTcpWaitReceive rc=OK FunctionTime=1011117012
+10:50:54.054223 13283.1 RSESS:00001e Retrying gsk_secure_soc_read...
+10:50:54.054237 13283.1 RSESS:00001e -----{ ccigsk_secure_soc_read
+10:50:54.054260 13283.1 RSESS:00001e gsk_secure_soc_read: input: gsk_soc_handle=0x0x7f41e80bd9d0, pBufferForData=0x0x7f41e80d89b0, nMaxBytes=16284, pnBytesReceived=0x0x7f41ed87f538
+10:50:54.054579 13283.1 RSESS:00001e ------{ cciTcpSslReadCallback
+10:50:54.054594 13283.1 RSESS:00001e cciTcpSslReadCallback: input: nBytesToRead=5
+10:50:54.054617 13283.1 RSESS:00001e -------{ cciTcpReceive
+10:50:54.054640 13283.1 RSESS:00001e Issuing recv 5 @ 0x7f41e8127fe8 (0x7f41e80d89b0 325176)
+10:50:54.054646 13283.1 RSESS:00001e --------{ recv
+10:50:54.054663 13283.1 RSESS:00001e --------}! recv rc=Unknown(6E) FunctionTime=17
+10:50:54.054672 13283.1 RSESS:00001e Data: 0xffffffff
+10:50:54.054678 13283.1 RSESS:00001e return recv -1
+10:50:54.054685 13283.1 RSESS:00001e Issuing recv 5 @ 0x7f41e8127fe8 (0x7f41e80d89b0 325176)
+10:50:54.054690 13283.1 RSESS:00001e --------{ recv
+10:50:54.054697 13283.1 RSESS:00001e --------} recv rc=OK FunctionTime=7
+10:50:54.054703 13283.1 RSESS:00001e Data: 0x00000000
+10:50:54.054709 13283.1 RSESS:00001e return recv 0
+10:50:54.054724 13283.1 RSESS:00001e --------{ cciTcpGetNameandAddress
+10:50:54.054735 13283.1 RSESS:00001e Reverse DNS lookups: TRUE (will resolve)
+10:50:54.054743 13283.1 RSESS:00001e Initial HostName: 'A.B.C.D' Port '(XXXX)'
+10:50:54.054749 13283.1 RSESS:00001e Initial pConv->PeerName 'A.B.C.D(XXXX)'
+10:50:54.054755 13283.1 RSESS:00001e Initial pConv->PeerHostNameCount 0
+10:50:54.054773 13283.1 RSESS:00001e ---------{ cciTcpResolveHostname
+10:50:54.054784 13283.1 RSESS:00001e Hostname: 'A.B.C.D'
+10:50:54.054790 13283.1 RSESS:00001e ----------{ xcsIsEnvironment
+10:50:54.054798 13283.1 RSESS:00001e xcsIsEnvironment[AMQ_NO_IPV6] = FALSE
+10:50:54.054826 13283.1 RSESS:00001e ----------} xcsIsEnvironment rc=OK FunctionTime=36
+10:50:54.054849 13283.1 RSESS:00001e serv_addr4
+10:50:54.054856 13283.1 RSESS:00001e 0x0000: 02000000 09a29a25 00000000 00000000 |.... ..%........|
+10:50:54.054861 13283.1 RSESS:00001e serv_addr6
+10:50:54.054865 13283.1 RSESS:00001e 0x0000: 00000000 00000000 00000000 00000000 |................|
10:50:54.054865 13283.1 RSESS:00001e 0x0010: 00000000 00000000 00000000 |............ |
+10:50:54.054870 13283.1 RSESS:00001e ---------} cciTcpResolveHostname rc=OK FunctionTime=97
+10:50:54.054878 13283.1 RSESS:00001e ---------{ cciTcpResolveAddress
+10:50:54.054884 13283.1 RSESS:00001e serv_addr4
+10:50:54.054889 13283.1 RSESS:00001e 0x0000: 02000000 09a29a25 00000000 00000000 |.... ..%........|
+10:50:54.054895 13283.1 RSESS:00001e Reverse DNS lookups are enabled
+10:50:54.056600 13283.1 RSESS:00001e AddrName: 'A.B.C.D'
+10:50:54.056610 13283.1 RSESS:00001e Hostname: '<real hostname'
+10:50:54.056613 13283.1 RSESS:00001e ---------} cciTcpResolveAddress rc=OK FunctionTime=1735
+10:50:54.056617 13283.1 RSESS:00001e Resolved name: mvsv (A.B.C.D)(XXXX)
+10:50:54.056621 13283.1 RSESS:00001e ---------{ xcsGetMemFn
+10:50:54.056632 13283.1 RSESS:00001e component:8 function:55 length:8 options:0 cbmindex:-1 *pointer:0x7f41e813fae0
+10:50:54.056636 13283.1 RSESS:00001e ---------} xcsGetMemFn rc=OK FunctionTime=15
+10:50:54.056639 13283.1 RSESS:00001e --------} cciTcpGetNameandAddress rc=OK FunctionTime=1915
+10:50:54.056647 13283.1 RSESS:00001e --------{ rrxError
+10:50:54.056653 13283.1 RSESS:00001e RetCode = 20009209, rc1 = 0, rc2 = 0, Comment1 = '<xxxx(hostname/ip address)>', Comment2 = 'TCP/IP', Comment3= '<queue name>', File= '/build/slot1/p800_P/src/lib/comms/amqccita.c', Line= '4157'

>> as the connection came up at 10:50 then it is more the condition that lead to the "polling" that really is the issue , which I read as not really a problem with MQ but more tcp/ip / gsk .
Back to top
View user's profile Send private message
mvic
PostPosted: Fri Feb 22, 2019 10:42 am Post subject: Re: MQ Trace gave a pointer to what is wrong Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2078

hthrane wrote:
as the connection came up at 10:50 then it is more the condition that lead to the "polling" that really is the issue , which I read as not really a problem with MQ but more tcp/ip / gsk .

You're about to get an AMQ9209 error message, based on this trace.
Your connection has timed out, waiting for bytes from the other side.
Next step would be to get a trace from the other side, at the same time, and see what was taking so long. The problem certainly isn't on the local side.
Are your routers and MTUs all set up correctly?
This seems too weird to be an issue within MQ (or GSKit for that matter).
Back to top
View user's profile Send private message
hughson
PostPosted: Fri Feb 22, 2019 2:11 pm Post subject: Reply with quote

Grand Master

Joined: 09 May 2013
Posts: 1253
Location: Bay of Plenty, New Zealand

I thought we had already determined you need to set MQCD.HeartbeatInterval on the client? Or alternatively use MQCD_CLIENT_CONN_DEFAULT to initialise the MQCD. What language is your client application written in btw?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:
Post new topicReply to topic Page 1 of 1

MQSeries.net Forum IndexIBM MQ API SupportMQClient - MQPUT hangs - random occurance (c code)
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.