MQSeries.net :: View topic - Sporadic timeouts on request/reply

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Sporadic timeouts on request/reply

Sporadic timeouts on request/reply

« View previous topic :: View next topic »

Author

Message

squidward

Posted: Thu Aug 05, 2010 2:39 pm Post subject: Sporadic timeouts on request/reply

Novice

Joined: 27 Mar 2009
Posts: 10

I have the following issue. Any guidance on likely root cause or troubleshooting steps would be appreciated.

I have a request/reply setup where I select the reply message based on correlation id. I get sporadic timeouts on the reply, perhaps %2 of the requests result in timeouts. However, when I inspect the reply queue after the fact I can see that
a) the reply is present
b) the correlation id is present and correct
c) the message timestamp is well within the timeout period, 7 seconds before the timeout was logged

So in theory the timeout should never have happened. Anybody seen anything like this?

My configuration:
mqclient v7.0.1 using JMS, against MQ 6.0.2.6 on wintel. This qmgr exchanges the requests/replies with a remote qmgr on CICS (I don't have that version).

I have 8 second timeout, and responses normally come in <1 second.

Thanks in advance for any advice.

jeevan

Posted: Thu Aug 05, 2010 3:50 pm Post subject: Re: Sporadic timeouts on request/reply

Grand Master

Joined: 12 Nov 2005
Posts: 1432

squidward wrote:

for the sake of troubleshooting, increase the time out value and see what happens.

Also, you can display curdepth and ipprocs of the queue while you put the message and see whether you can see the curdepth before ipprocs disppear or after.

bruce2359

Posted: Thu Aug 05, 2010 3:59 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Exactly what do you mean by timeout?

Does your app issue an MQGET with WAIT? Did the WAIT expire?

Is the CICS transaction (is there one involved)? timeout?

How do you know something timed out? Was there an error logged in CICS? MQ?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

squidward

Posted: Thu Aug 05, 2010 5:20 pm Post subject:

Novice

Joined: 27 Mar 2009
Posts: 10

Yes, I will try increasing the timeout. Not the best solution, since I still don't have a root cause, and there is a user waiting on the transaction who will have to sit there for even longer.

By timeout, I mean I issue a QueueReceiver.receive(8000) call that returns null after 8 seconds. Even though according to the MQ logs the message was put 7.8 seconds before. Only thing I can think of is that somehow the remote QMGR is not committing the put to my local qmgr, so I'm not able to retrieve the message even though the put has already occured.

The CICS transaction is not timing out -- the remote qmgr is CICS:

(MY CLIENT) <-> (WINTEL QMGR) <-> (MAINFRAME QMGR) <-> CICS SYSTEM

Thanks again.

bruce2359

Posted: Thu Aug 05, 2010 6:05 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Quote:

(MY CLIENT) <-> (WINTEL QMGR) <-> (MAINFRAME QMGR) <-> CICS SYSTEM

Time to get your z/OS and CICS sysprogs to look at SMF performance data.

I'm going to go out on a limb and speculate that CICS and the z/OS qmgr are not the likely culprits. 8 seconds on a mainframe is, well, 8 seconds on a mainframe. CICS is capable of easily doing thousands of transactions per second.

Is there a database involved in the transaction on the client? On the Wintel qmgr? On the z/OS qmgr? The CICS app?

Are all other apps experiencing delays?

Is the JMS code from the CICS transaction/application? Or from an MQ application? In either case, some tuning of the JVM might be in order.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

fjb_saper

Posted: Thu Aug 05, 2010 8:08 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

You might also want to open a PMR and ship some trace logs to IBM.
The trouble is catching the trace log at the point in time when the problem happens.

Are you using a lot of temporary dynamic reply queues? There might be a caching problem in the channel where the dynamic pool is cached too long.
The PMR will tell you what the tuning parameter is ...

Have fun

_________________
MQ & Broker admin

PeterPotkay

Posted: Sat Aug 07, 2010 10:28 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Sporadic timeouts with no other explanation when a Windows QM is involved? Sounds like what I am grappling with now myself. After all applications involved have proved they are processing quickly, after the SAN and server guys have confirmed no I/O problems, after the Network traces showed no errors, we think its related to a new version of anti virus software causing similar problems with other service areas as well. Apparently this thing is causing multiple virus definitions to be downloaded per day, multiple scans to be happening, NIC cards to get hung up, files are being scanned that shouldn't be, etc. Don't know if its a bug or if they misconfigured the thing or what.

I've identified the the XMITQ between the Windows QMs backing up for a minute or 2 every few hours with no errors at all and it recovers on its own. We are going to have them back off that anti virus upgrade and see if it helps like it did other areas.

There can be a lot of reasons for your symptoms. But if you've exhausted all other options, check to to see if the anti virus software hasn't recently been updated.
_________________
Peter Potkay
Keep Calm and MQ On

bruce2359

Posted: Sat Aug 07, 2010 10:43 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

I've also seen odd and seemingly inexplicable behaviors when sysadmins installed multiple anti-virus software on the same o/s.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

squidward

Posted: Tue Aug 10, 2010 7:30 am Post subject:

Novice

Joined: 27 Mar 2009
Posts: 10

Glad to hear I'm not the only one banging my head against the wall.

Not using dynamic queues or anything, just normal request queue / reply queue.

I don't think its a Windows server issue, reason being that I have a separate standalone queue manager on the same windows box, which I never see timeouts with despite vastly higher volumes of data and same client codebase. So I think it must be issue on channel from the mainframe.

I did increase the timeout value and do not see any more timeouts. Problem not solved, since i know the remote system is responding in subsecond times.

Guess nothing left but to have a look into the trace options.

gbaddeley

Posted: Tue Aug 10, 2010 3:49 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

You should also take a close look at sporadic slow response in the back end application.
_________________
Glenn

mvic

Posted: Tue Aug 10, 2010 4:37 pm Post subject:

Jedi

Joined: 09 Mar 2004
Posts: 2080

squidward wrote:

Guess nothing left but to have a look into the trace options.

I always jump to this point if a client reports long end-to-end times. There's no substitute for knowing where the time is being lost in the total transaction.

MQ trace can help here, if you know enough about your message (MsgId, put time etc.) that you can find it in the trace. MQ will have written precise timestamps for you (approx. microsecond resolution) saying what it was doing with each message, and when. You can then correlate with data from other elements of your message's round trip.

BUT be aware that trace can slow down your system even more, which might make a bad situation worse if you are already in danger of breaching SLAs!

bruce2359

Posted: Tue Aug 10, 2010 5:46 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Quote:

...it must be issue on channel from the mainframe.

Spoken (typed) like a true newbie. I gather that you are not a mainframe person.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

gbaddeley

Posted: Wed Aug 11, 2010 3:46 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

mvic wrote:

BUT be aware that trace can slow down your system even more, which might make a bad situation worse if you are already in danger of breaching SLAs!

and using up disk space in very very quickly on production systems! I suggest that you start MQ trace at a time of day when a slow message is most likely and then stop it immediately after. Make sure you have the approval of the application managers.
_________________
Glenn

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Sporadic timeouts on request/reply

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP