Author |
Message
|
lyntongrice |
Posted: Tue Oct 25, 2011 4:59 am Post subject: MQGET hangs in LOOP with syncpoint |
|
|
Acolyte
Joined: 26 Sep 2010 Posts: 70
|
Hi there,
I have a C application (on Solaris SPARC processing thousands of messages under syncpoint running on Websphere MQ 6.
The application seems to run 98% perfect all the time but now I see that from time to time the MQGET seems to "hang", then if I stop and start the application it is fine again...
I am issuing MQBACK and MQCMIT depending on if the message is processed successfully.
Some of my code settings are:
cno.
Code: |
Options = MQCNO_HANDLE_SHARE_NONE;
MQCONNX(qmgr,
&cno,
&Hcon,
&CompCode,
&CReason); |
...
....
Code: |
memcpy(md.Format, MQFMT_STRING, (size_t)MQ_FORMAT_LENGTH);
gmo.Options = MQGMO_WAIT | MQGMO_SYNCPOINT | MQGMO_FAIL_IF_QUIESCING | MQGMO_CONVERT;
gmo.WaitInterval = 5000; |
...
....
The basic pseudo code is:
while((CompCode != MQCC_FAILED){
beginTransaction;
MQGET
commit or rollback transaction
};
So it just seems to be stuck in the loop somewhere....I have seen this: http://www-01.ibm.com/support/docview.wss?uid=swg21179245 but I do issue a commit etc?
Any ideas what would cause this?
Thanks for the help
Lynton |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 25, 2011 5:07 am Post subject: Re: MQGET hangs in LOOP with syncpoint |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lyntongrice wrote: |
I have a C application (on Solaris SPARC processing thousands of messages under syncpoint running on Websphere MQ 6. |
What version of WMQv6?
About how many of these thousands of messages are in a single unit of work (i.e. how many gets before a commit)?
Does the application check WMQ completion and reason codes after each WMQ call? Does it respond to any error or just expected ones (like 2033)?
Are there any indications in the queue manager log that it's short or out of log space (either allocated log or disc space to keep the logs on)?
What's different in the 2% of cases to the 98%? Is there any correlation between failure & message count?
Why have you posted this in the forum's cluster section? How does clustering fit into this? Or is this just in the wrong place & need to me moved?  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
zpat |
Posted: Tue Oct 25, 2011 5:18 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
When you issue MQBACK - what do you expect to happen?
If you think the message is moved to the backout queue named in the queue definition - then I have bad news for you. That doesn't happen by itself.
MQBACK will cause the message to re-appear on the original queue and be processed again. This can cause a loop - Unless your code checks the message backout count and moves it to a failure queue.
You can use the backout re-queue name and backout re-queue threshold values from the queue definition, by inquiring on them. However these fields do nothing more than store these values for your code to refer to. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 25, 2011 5:26 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
zpat wrote: |
When you issue MQBACK - what do you expect to happen?
If you think the message is moved to the backout queue named in the queue definition - then I have bad news for you. That doesn't happen by itself. |
Ooo...good catch!  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lyntongrice |
Posted: Tue Oct 25, 2011 5:31 am Post subject: |
|
|
Acolyte
Joined: 26 Sep 2010 Posts: 70
|
Hi there,
Sorry about posting to the cluster forum? No intentional....oops....
I am not 100% what version we are on, but I think we on a very low "support pack" on WMQ6.
I do check the completion codes / reason codes....and act accordingly. What I have done now is change from "MQGMO_WAIT" to "MQGMO_NO_WAIT" and no issues so far.....and I have a "little sleep" between loops....
I will monitor it now and see if it hangs again and will get back to you....
Thanks for the help
Lynton |
|
Back to top |
|
 |
exerk |
Posted: Tue Oct 25, 2011 5:37 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
Moving this to a more appropriate forum... _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
zpat |
Posted: Tue Oct 25, 2011 5:50 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
MQGMO_WAIT is a good option to use. Much better than waiting in your own code.
Unlikely to be related to your problem.
Did you read my post about what MQBACK actually does?
Have your tried your "new" code with a MQBACK situation? |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 25, 2011 6:02 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lyntongrice wrote: |
What I have done now is change from "MQGMO_WAIT" to "MQGMO_NO_WAIT" and no issues so far.....and I have a "little sleep" between loops.... |
Be aware that this changes the behaviour slightly. A WMQ wait will end as soon as there's a message available (so if there's another message already on the queue it doesn't wait at all). If you're using sleep() then the wait is absolute and you've introduced a delay. On thousands of messages this wait time could become significant.
I also don't see how this will help you. I think a more profitable line of enquiry is the one indicated by my most worthy associate; that in 2% of cases your application is choking on a message and ending up in a "poison message" scenario. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lyntongrice |
Posted: Tue Oct 25, 2011 9:47 am Post subject: |
|
|
Acolyte
Joined: 26 Sep 2010 Posts: 70
|
Hi there,
I am expecting the MQBACK to roll the message back onto the queue, that is fine. It seems to be very ad hoc though.....for instance I can launch 4 instances of the application pointing to the same queue with say 10 000 messages on it......and maybe after 4000 messages 1 of them will lock up.....I know you will think that the other 3 are too fast for it to get a turn to do the MQGET but even if I run one it seems to hang after processing lots.
I deal with opening and writing physical files on the OS for each message, I wonder if there are sometimes too many handles open or something....
I will keep digging and will log what is happening etc....
Thanks again
Lynton |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 25, 2011 9:58 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lyntongrice wrote: |
I am expecting the MQBACK to roll the message back onto the queue, that is fine. |
Not if there's a problem with the message body (the business data) that causes the application to abend. Because it will abend the next time it's read off the queue. And the time after that. And the time after that. And it will look like the program's stuck in a loop.
Because it is.
lyntongrice wrote: |
It seems to be very ad hoc though.....for instance I can launch 4 instances of the application pointing to the same queue with say 10 000 messages on it......and maybe after 4000 messages 1 of them will lock up.....I know you will think that the other 3 are too fast for it to get a turn to do the MQGET but even if I run one it seems to hang after processing lots. |
Or when it gets to the poison message?
For the record, if there are 4 applications with 4 MQGet calls outstanding the queue manager should ensure all 4 get a turn. _________________ Honesty is the best policy.
Insanity is the best defence.
Last edited by Vitor on Tue Oct 25, 2011 10:01 am; edited 1 time in total |
|
Back to top |
|
 |
lyntongrice |
Posted: Tue Oct 25, 2011 10:01 am Post subject: |
|
|
Acolyte
Joined: 26 Sep 2010 Posts: 70
|
Thanks for that, the only thing is I write to STDOUT and FLUSH stdout with some "info messages" on what is happening......and I do not see any of these "printf" statements when it hangs ;-(
I will keep you posted, need to try simulate it on WMQ 7....
Chat later |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 25, 2011 10:06 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lyntongrice wrote: |
Thanks for that, the only thing is I write to STDOUT and FLUSH stdout with some "info messages" on what is happening......and I do not see any of these "printf" statements when it hangs ;-( |
Which supports your theory & you've run the OS out of file handles (or other resource) so that the actual application is hanging.
It would be interesting to see if when you cancel the application out of it's "hanging" state the number of messages left on the queue + the number of files written = the number of messages originally written. If this is the case, and the "top" message of the queue post-cancel is the "top" message when the application was hanging, then the application should hang when you restart it.
If it doesn't, the application is hanging for a non-WMQ reason. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lyntongrice |
Posted: Tue Oct 25, 2011 10:11 am Post subject: |
|
|
Acolyte
Joined: 26 Sep 2010 Posts: 70
|
Hi there,
Well when the application hangs and I CTRL-C it and then immediately restart it it processes the messages off WMQ as if nothing has happened.......so I will try dig into the number of file handles the application has open.....what irritates me though is I open and closes every file I use but it may be so FAST (doing about 12 messages per second), that the file handles being clsed cannot keep up......
Let me check it out
Lynton |
|
Back to top |
|
 |
Vitor |
Posted: Tue Oct 25, 2011 10:17 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
The other solution would be to eliminate the application completely and modify whatever's reading these files to read the messages directly. Not only would you stop thrashing the storage opening and closing file handles, but you'd eliminate the I/O costs of opening the file, writing the file, closing the file, then opening, reading and closing the file again by the final target application.
In WMQ terms, 12 messages a second is not that fast. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lyntongrice |
Posted: Tue Oct 25, 2011 10:52 am Post subject: |
|
|
Acolyte
Joined: 26 Sep 2010 Posts: 70
|
Hi there,
When I say 12 messages per second I mean processing the message as well (user exits etc) and sending into the target system.....but I hear you, it could be quicker
Chat later |
|
Back to top |
|
 |
|