|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Unix MQAPI bulk msg processing |
« View previous topic :: View next topic » |
Author |
Message
|
kolthorr |
Posted: Tue May 06, 2003 12:39 am Post subject: Unix MQAPI bulk msg processing |
|
|
Apprentice
Joined: 09 Mar 2002 Posts: 31
|
Hi all,
We've got a little Unix C program running on AIX 5, MQ 5.2. It gets triggered, forks some worker processess and reads messages off a single input queue.
Each child process creates its own connection to MQ, derived from the single trigger message (so they all open the same input queue for shared input). They do an MQGET w/browse to get the msg size, then a get with 'msg under cursor' set to retrieve the actual msg.
When I dump a large number of messages on the queue (so they take a while to be processed off) a seemingly random number of these processes will finish with a 'NO_MSG_AVAILABLE' MQRC from the first MQGET call after a random number of interations, not even a consistent period of time, even though there are still messages on the queue. The remaining processes continue processing msgs normally until there are no msgs on input queue.
Any thoughts? I'll include a small log sample which will explain better what I mean. The only thing I can think of is maybe some Unix scheduling is getting in the way (this machine is running a backend DB that processes the messages).
Thanks very much,
Andrew
Log file (msgs truncated to protect the innocent). First column is pid. The numbers on the 'child exit' lines are number of msgs processed, and exit code. -7 is equivalent to MQRC 2033.
Code: |
63534 06/05/0318:17:19:982 created child 17778
63534 06/05/0318:17:19:988 created child 55514
63534 06/05/0318:17:19:991 created child 56626
55514 06/05/0318:17:20:065 received from MDp: [U302612 ]
55514 06/05/0318:17:20:185 rcvd from backend: [ 20Cannot ]
55514 06/05/0318:17:20:186 sent to MDp: [N20Cannot open F]
17778 06/05/0318:17:20:062 received from MDp: [U303478 ]
17778 06/05/0318:17:20:219 rcvd from backend: [ 36Lead Nu]
17778 06/05/0318:17:20:220 sent to MDp: [N36Lead Number d]
56626 06/05/0318:17:20:098 received from MDp: [U303223 ]
56626 06/05/0318:17:20:244 rcvd from backend: [ 02Account]
56626 06/05/0318:17:20:245 sent to MDp: [N02Account Numbe]
17778 06/05/0318:17:33:412 child exit 105 -7
56626 06/05/0318:18:18:670 child exit 253 -7
55514 06/05/0318:18:18:686 child exit 252 -7
63534 06/05/0318:18:18:700 main exit |
You can see the parent (triggered) pid of 63534 creating 3 workers. They all begin processing normally. Then pid 17778 gets a seemingly incorrect 2033 after a random period of time, while the other two continue running until no msgs are available (for real).
I've seen both this behaviour, also 2 of the 3 will die early, leaving one process to finish the processing. Wierd. |
|
Back to top |
|
 |
EddieA |
Posted: Tue May 06, 2003 5:28 am Post subject: |
|
|
 Jedi
Joined: 28 Jun 2001 Posts: 2453 Location: Los Angeles
|
You are LOCKing the record when you Browse it, aren't you. If not, 2 threads could potentially Browse the record, then when the 1st one Gets it, the 2nd Get will fail.
Cheers, _________________ Eddie Atherton
IBM Certified Solution Developer - WebSphere Message Broker V6.1
IBM Certified Solution Developer - WebSphere Message Broker V7.0 |
|
Back to top |
|
 |
Keka |
Posted: Tue May 06, 2003 5:32 am Post subject: |
|
|
Voyager
Joined: 28 Dec 2002 Posts: 96
|
When there are more than one process reading on the queue, it is possible that two processes browsed the same message. When this happens, the process that got the message processes normally and the process that did not find the message will come back with that error 2033.
Here is how it happens..
Proc 1 browsed the first message
while proc1 is checking the message size, Proc2 browsed the same message.
Proc1 got the message, proc2 calculates the mesage length
when proc2 comes back to get the message, since the message is already gone, it gets 2033..
IBM Manuals, I believe it is Appplication programming guide, tells you to handle this situation. There is nothing wrong in it and it is a normal scenario..
hope this helps.. _________________ Keka |
|
Back to top |
|
 |
kolthorr |
Posted: Tue May 06, 2003 4:42 pm Post subject: What about MQRC 2034 though? |
|
|
Apprentice
Joined: 09 Mar 2002 Posts: 31
|
Thanks all for the replies! I was under the impression that in this situation (whereby a message under the cursor is retrieved by some other process), the 2nd MQGET call would return with NO_MSG_UNDER_CURSOR? I am in fact checking for and handling this situation - it does occur regularly.
It seems that the first call (the browse only) doesn't return a 2034 (understandably) - it returns a 2033. Perhaps it is 'looking' at a msg, and it disappears as it is getting info about it. Hmmm! Ok, that sounds possible. I'll add some sanity checking around the 'browse' call.
Thanks very much for the tips!
Regards,
Andrew |
|
Back to top |
|
 |
kolthorr |
Posted: Tue May 06, 2003 4:58 pm Post subject: Locking |
|
|
Apprentice
Joined: 09 Mar 2002 Posts: 31
|
I hadn't been specifying any browse/get locking for fear of impacting performance. Is that a valid assumption? Or is it a non-issue with 5.2?
Thanks,
Andrew |
|
Back to top |
|
 |
kolthorr |
Posted: Tue May 06, 2003 7:50 pm Post subject: Solved! |
|
|
Apprentice
Joined: 09 Mar 2002 Posts: 31
|
Just to follow up & finish off, I found that adding 'MQGMO_LOCK' to the first MQGET call solved all the problems (as suggested).
Any additional overhead (that may or may not actually exist outside of my mind was more than made up for by having all the worker processes continue running to completion, rather than a couple of them dying part way through processing.
Thanks once more,
Andrew |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|