Author |
Message
|
squidward |
Posted: Thu Aug 05, 2010 2:39 pm Post subject: Sporadic timeouts on request/reply |
|
|
Novice
Joined: 27 Mar 2009 Posts: 10
|
I have the following issue. Any guidance on likely root cause or troubleshooting steps would be appreciated.
I have a request/reply setup where I select the reply message based on correlation id. I get sporadic timeouts on the reply, perhaps %2 of the requests result in timeouts. However, when I inspect the reply queue after the fact I can see that
a) the reply is present
b) the correlation id is present and correct
c) the message timestamp is well within the timeout period, 7 seconds before the timeout was logged
So in theory the timeout should never have happened. Anybody seen anything like this?
My configuration:
mqclient v7.0.1 using JMS, against MQ 6.0.2.6 on wintel. This qmgr exchanges the requests/replies with a remote qmgr on CICS (I don't have that version).
I have 8 second timeout, and responses normally come in <1 second.
Thanks in advance for any advice. |
|
Back to top |
|
 |
jeevan |
Posted: Thu Aug 05, 2010 3:50 pm Post subject: Re: Sporadic timeouts on request/reply |
|
|
Grand Master
Joined: 12 Nov 2005 Posts: 1432
|
squidward wrote: |
I have the following issue. Any guidance on likely root cause or troubleshooting steps would be appreciated.
I have a request/reply setup where I select the reply message based on correlation id. I get sporadic timeouts on the reply, perhaps %2 of the requests result in timeouts. However, when I inspect the reply queue after the fact I can see that
a) the reply is present
b) the correlation id is present and correct
c) the message timestamp is well within the timeout period, 7 seconds before the timeout was logged
So in theory the timeout should never have happened. Anybody seen anything like this?
My configuration:
mqclient v7.0.1 using JMS, against MQ 6.0.2.6 on wintel. This qmgr exchanges the requests/replies with a remote qmgr on CICS (I don't have that version).
I have 8 second timeout, and responses normally come in <1 second.
Thanks in advance for any advice. |
for the sake of troubleshooting, increase the time out value and see what happens.
Also, you can display curdepth and ipprocs of the queue while you put the message and see whether you can see the curdepth before ipprocs disppear or after. |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Aug 05, 2010 3:59 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Exactly what do you mean by timeout?
Does your app issue an MQGET with WAIT? Did the WAIT expire?
Is the CICS transaction (is there one involved)? timeout?
How do you know something timed out? Was there an error logged in CICS? MQ? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
squidward |
Posted: Thu Aug 05, 2010 5:20 pm Post subject: |
|
|
Novice
Joined: 27 Mar 2009 Posts: 10
|
Yes, I will try increasing the timeout. Not the best solution, since I still don't have a root cause, and there is a user waiting on the transaction who will have to sit there for even longer.
By timeout, I mean I issue a QueueReceiver.receive(8000) call that returns null after 8 seconds. Even though according to the MQ logs the message was put 7.8 seconds before. Only thing I can think of is that somehow the remote QMGR is not committing the put to my local qmgr, so I'm not able to retrieve the message even though the put has already occured.
The CICS transaction is not timing out -- the remote qmgr is CICS:
(MY CLIENT) <-> (WINTEL QMGR) <-> (MAINFRAME QMGR) <-> CICS SYSTEM
Thanks again. |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Aug 05, 2010 6:05 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
(MY CLIENT) <-> (WINTEL QMGR) <-> (MAINFRAME QMGR) <-> CICS SYSTEM |
Time to get your z/OS and CICS sysprogs to look at SMF performance data.
I'm going to go out on a limb and speculate that CICS and the z/OS qmgr are not the likely culprits. 8 seconds on a mainframe is, well, 8 seconds on a mainframe. CICS is capable of easily doing thousands of transactions per second.
Is there a database involved in the transaction on the client? On the Wintel qmgr? On the z/OS qmgr? The CICS app?
Are all other apps experiencing delays?
Is the JMS code from the CICS transaction/application? Or from an MQ application? In either case, some tuning of the JVM might be in order. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Aug 05, 2010 8:08 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
You might also want to open a PMR and ship some trace logs to IBM.
The trouble is catching the trace log at the point in time when the problem happens.
Are you using a lot of temporary dynamic reply queues? There might be a caching problem in the channel where the dynamic pool is cached too long.
The PMR will tell you what the tuning parameter is ...
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
PeterPotkay |
Posted: Sat Aug 07, 2010 10:28 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Sporadic timeouts with no other explanation when a Windows QM is involved? Sounds like what I am grappling with now myself. After all applications involved have proved they are processing quickly, after the SAN and server guys have confirmed no I/O problems, after the Network traces showed no errors, we think its related to a new version of anti virus software causing similar problems with other service areas as well. Apparently this thing is causing multiple virus definitions to be downloaded per day, multiple scans to be happening, NIC cards to get hung up, files are being scanned that shouldn't be, etc. Don't know if its a bug or if they misconfigured the thing or what.
I've identified the the XMITQ between the Windows QMs backing up for a minute or 2 every few hours with no errors at all and it recovers on its own. We are going to have them back off that anti virus upgrade and see if it helps like it did other areas.
There can be a lot of reasons for your symptoms. But if you've exhausted all other options, check to to see if the anti virus software hasn't recently been updated. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bruce2359 |
Posted: Sat Aug 07, 2010 10:43 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
I've also seen odd and seemingly inexplicable behaviors when sysadmins installed multiple anti-virus software on the same o/s. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
squidward |
Posted: Tue Aug 10, 2010 7:30 am Post subject: |
|
|
Novice
Joined: 27 Mar 2009 Posts: 10
|
Glad to hear I'm not the only one banging my head against the wall.
Not using dynamic queues or anything, just normal request queue / reply queue.
I don't think its a Windows server issue, reason being that I have a separate standalone queue manager on the same windows box, which I never see timeouts with despite vastly higher volumes of data and same client codebase. So I think it must be issue on channel from the mainframe.
I did increase the timeout value and do not see any more timeouts. Problem not solved, since i know the remote system is responding in subsecond times.
Guess nothing left but to have a look into the trace options. |
|
Back to top |
|
 |
gbaddeley |
Posted: Tue Aug 10, 2010 3:49 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
You should also take a close look at sporadic slow response in the back end application. _________________ Glenn |
|
Back to top |
|
 |
mvic |
Posted: Tue Aug 10, 2010 4:37 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
squidward wrote: |
Guess nothing left but to have a look into the trace options. |
I always jump to this point if a client reports long end-to-end times. There's no substitute for knowing where the time is being lost in the total transaction.
MQ trace can help here, if you know enough about your message (MsgId, put time etc.) that you can find it in the trace. MQ will have written precise timestamps for you (approx. microsecond resolution) saying what it was doing with each message, and when. You can then correlate with data from other elements of your message's round trip.
BUT be aware that trace can slow down your system even more, which might make a bad situation worse if you are already in danger of breaching SLAs! |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Aug 10, 2010 5:46 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
...it must be issue on channel from the mainframe. |
Spoken (typed) like a true newbie. I gather that you are not a mainframe person. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
gbaddeley |
Posted: Wed Aug 11, 2010 3:46 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
mvic wrote: |
BUT be aware that trace can slow down your system even more, which might make a bad situation worse if you are already in danger of breaching SLAs! |
and using up disk space in very very quickly on production systems! I suggest that you start MQ trace at a time of day when a slow message is most likely and then stop it immediately after. Make sure you have the approval of the application managers. _________________ Glenn |
|
Back to top |
|
 |
|