Author |
Message
|
fredmoore |
Posted: Tue Jun 16, 2009 3:05 am Post subject: Rogue infinite wait in receive? (waitForNotification -1) |
|
|
Novice
Joined: 23 Mar 2009 Posts: 24
|
Hi folks,
we have a long running MQ 7.0.0.2 JMS "consumer" application running in client mode against an MQ 7.0.0.2 qmgr, after a while (varying from a few minutes to a few hours) it does NOT exit from a receiver.receive(2000L).
This translates into this server side error:
Code: |
----- amqccita.c : 3357 -------------------------------------------------------
06/16/2009 11:25:27 AM - Process(29973.49) User(mqm) Program(amqrmppa)
AMQ9208: Error on receive from host 192 (192.168.7.186).
EXPLANATION:
An error occurred receiving data from 192 (192.168.7.186) over TCP/IP. This may
be due to a communications failure.
ACTION:
The return code from the TCP/IP (read) call was 104 (X'68'). Record these
values and tell the systems administrator.
|
JMS trace ends with this interesting lines:
Code: |
14:47:06.953.0J 0005 @60e390 c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation - { wakeManagerThread()
14:47:06.953.0K 0005 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana -- { wake()
14:47:06.953.0L 0005 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana --- d wake() Notifying threadNotifierLock <null>
14:47:06.953.0M 0005 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana --- d wake() threadNotifierLock notified <null>
14:47:06.953.0N 0005 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana -- } wake()
14:47:06.953.0O 0005 @60e390 c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation - } wakeManagerThread()
14:47:06.953.0P 0005 @f3770c c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWor } run()<exitIndex 2>
14:47:06.953.0Q 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana --- d waitForNotification(long) setting workWaiting=false. <null>
14:47:06.953.0R 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana -- } waitForNotification(long) returns [false] Boolean
14:47:06.953.0S 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana - } waitForNotification() returns [false] Boolean
14:47:06.953.0T 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana - d run() Synchronizing on threadNotifierLock [java.lang.Object@1d47f59]
14:47:06.953.0U 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana - d run() Got sync lock [java.lang.Object@1d47f59]
14:47:06.953.0V 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana - d run() No high priority work to do <null>
14:47:06.953.0W 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana - d run() No work to be done. Waiting for further notification <null>
14:47:06.953.0X 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana - { waitForNotification()
14:47:06.953.0Y 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana -- { waitForNotification(long) [-1]
14:47:06.953.0Z 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana --- d waitForNotification(long) synchronizing on threadNotifierLock <null>
14:47:06.953.10 0004 @fa706d c.i.m.c.c.j2se.workqueue.WorkQueueManagerImplementation$WorkQueueMana --- d waitForNotification(long) workWaiting=false. Waiting for -1 <null>
|
"waitForNotification() Waiting for -1" ...sounds like an infinite wait!
Any clues anyone?
Cheers,
F. |
|
Back to top |
|
 |
WMBDEV1 |
Posted: Tue Jun 16, 2009 3:11 am Post subject: |
|
|
Sentinel
Joined: 05 Mar 2009 Posts: 888 Location: UK
|
Just my quick thoughts as i'm not on version 7 yet....
Are you able to try this on a previous version of MQ?
Have you raised a PMR for this?
Are the results the same if you try a smaller duration of the receive? eg.... receiver.receive(1000) |
|
Back to top |
|
 |
fredmoore |
Posted: Tue Jun 16, 2009 4:01 am Post subject: |
|
|
Novice
Joined: 23 Mar 2009 Posts: 24
|
Quote: |
Are you able to try this on a previous version of MQ?
|
...we are in the process of testing this on a WMQ 6.0 JMS, I will post any findings.
Quote: |
Have you raised a PMR for this?
|
...not yet
Quote: |
Are the results the same if you try a smaller duration of the receive? eg.... receiver.receive(1000)
|
...not tested yet, do you expect this to make a difference for a specific reason?
Cheers,
F. |
|
Back to top |
|
 |
WMBDEV1 |
Posted: Tue Jun 16, 2009 4:08 am Post subject: |
|
|
Sentinel
Joined: 05 Mar 2009 Posts: 888 Location: UK
|
fredmoore wrote: |
Quote: |
Are the results the same if you try a smaller duration of the receive? eg.... receiver.receive(1000)
|
...not tested yet, do you expect this to make a difference for a specific reason?
|
I have no specific reason, I was just thinking of things I might try in that situation.... as I said, just some quick thoughts
I suspect the results of trying the code against a previous version of MQ will be the most enlightening though! |
|
Back to top |
|
 |
fredmoore |
Posted: Thu Jun 18, 2009 7:27 am Post subject: |
|
|
Novice
Joined: 23 Mar 2009 Posts: 24
|
Quick update...
The problem seems to happen only when using WMQ 7 JMS classes, while the same app works fine with WMQ 6 JMS classes.
We are now trying to reproduce it using an ad-hoc minimalist JMS application (rather than the real application) in preparation for a PMR.
I'll keep you guys posted, any thoughts in the meantime are appreciated.
Cheers,
F. |
|
Back to top |
|
 |
fredmoore |
Posted: Mon Aug 03, 2009 5:21 am Post subject: SOLVED (...almost) |
|
|
Novice
Joined: 23 Mar 2009 Posts: 24
|
Hi,
IBM Support gave us a fix for APAR IC61153 which solved the deadlock problem (but still causes an unexpected -- but unrelated -- MQRC=2009 still under investigation).
Even without using the fix, IBM Support suggested a workaround consisting in passing this directive to JVM:
Code: |
-Djmscc.workqueue.poolTimeout=60000 |
Cheers,
F. |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Aug 03, 2009 12:18 pm Post subject: Re: SOLVED (...almost) |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
fredmoore wrote: |
Hi,
IBM Support gave us a fix for APAR IC61153 which solved the deadlock problem (but still causes an unexpected -- but unrelated -- MQRC=2009 still under investigation).
Even without using the fix, IBM Support suggested a workaround consisting in passing this directive to JVM:
Code: |
-Djmscc.workqueue.poolTimeout=60000 |
Cheers,
F. |
You also did not tell us if you used an ErrorListener on the connection.
Usually mandatory with a messageListener...
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
fredmoore |
Posted: Tue Aug 04, 2009 12:21 am Post subject: Repro program |
|
|
Novice
Joined: 23 Mar 2009 Posts: 24
|
Quote: |
You also did not tell us if you used an ErrorListener on the connection
Usually mandatory with a messageListener...
|
...the repro program we used was way simpler than than, you can have a look at it here.
Cheers,
F. |
|
Back to top |
|
 |
|