ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » MQ Clustering once-only delivery failing

Post new topic  Reply to topic
 MQ Clustering once-only delivery failing « View previous topic :: View next topic » 
Author Message
abegerho
PostPosted: Fri Jul 26, 2002 7:00 am    Post subject: MQ Clustering once-only delivery failing Reply with quote

Newbie

Joined: 26 Jul 2002
Posts: 5

Hi,
I am using MQSeries 5.2 on AIX 4.3.3. My queue manager is setup to do
linear logging, and I've allocated the maximum allowable
space for the log file.
I have a multi process application where a parent process starts a bunch

of child processes (let's call them workers) all of which are blocked on

an mqget call on the same queue.

I am using the C++ interface provided by MQ on top of the MQSeries C
APIs.

I am setting the following options on the queue and for the messages.
In pseudo code:
ImqQueue.setOpenOptions(MQOO_INPUT_SHARED);
ImqGetMessageOptions.setOptions(MQGMO_WAIT | MQGMO_SYNCPOINT);
ImqGetMessageOptions.setWaitInterval(MQWI_UNLIMITED);

After processing the message and even under error condition, I do an
ImqQueueManager::commit();

Now, if I don't use a MQ Clustering then everything works fine. But in a
clustered configuration, say, with 2 nodes (one of them is the producer of messages, the other the consumer), both
defined with full repository, when I try to retrieve messages, at times
(quite randomly) multiple workers are ending up with the
same message.
I feel that my application is written correctly as it works right if I
don't use Clustering, but with clustering, I get the same message posted
multiple times. I believe I've even seen it come back to the same worker
once.
Note: I know that the producer side application is probably not flawed as I'm generating new message ids there by using the appropriate MQ Option, and I print out these message ids on the consumer side, and in the duplicate messages that I'm talking about, the message ids are identical.

Does anyone have a guess at how this could be happening? Any help would
be much appreciated.


Regards,
Abhi
Back to top
View user's profile Send private message
mrlinux
PostPosted: Fri Jul 26, 2002 7:27 am    Post subject: Reply with quote

Grand Master

Joined: 14 Feb 2002
Posts: 1261
Location: Detroit,MI USA

Well I would look at your application closer I have been doing something like that for a long time and have not had any issues of duplicate messages.

Iam not saying you are wrong it could be a problem.
_________________
Jeff

IBM Certified Developer MQSeries
IBM Certified Specialist MQSeries
IBM Certified Solutions Expert MQSeries
Back to top
View user's profile Send private message Send e-mail
bduncan
PostPosted: Fri Jul 26, 2002 9:38 am    Post subject: Reply with quote

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

I actually ran a cluster for nearly 1 year with AIX and MQ 5.2, and applications coded in C. I had plenty of issues, but none like the one you are describing
I would recommend doing this. Whenever one of your worker threads does a successful MQGET, aside from printing out the MsgId, you should also look at the BACKOUTCOUNT attribute. If this is nonzero, it means that for some reason the message has been rolled back on to the queue one or more times which would explain why you are seemingly getting duplicates, when in fact it is the same message again and again. As to why the rollbacks are occuring that's another story, but even if your application doesn't have an MQBACK coded in it anywhere, the rollback can still occur if your application dies before the MQCMIT.
That's my gut feeling. But another approach you can take is on the producer queue manager, stop the cluster sender channel going to the consumer queue manager. Now let some messages pile up on the cluster transmission queue, and at some point stop your producer application. Now, you know exactly how many messages are on the transmission queue. You can then enable the transmission queue for GETs, and use some browser program (like MQExplorer or Roger Lacroix's program) to view the messages and confirm that none are duplicates. Now start the channel and let the messages flow, but don't have your consumer applications running on the other end. Let all the messages land on the destination queue, and browse the contents again, confirming no duplicates. Finally, start your consumer application and see if it thinks it encountered any duplicates. If it does, it seems to confirm my first theory, that the messages are being rolled back for some reason...
_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator
Back to top
View user's profile Send private message Visit poster's website AIM Address
abegerho
PostPosted: Tue Aug 06, 2002 12:58 pm    Post subject: Reply with quote

Newbie

Joined: 26 Jul 2002
Posts: 5

The messages are being rolled back. When I print the backout count for the duplicate messages, it's > 0. When the messages roll back, MQ is also generating corresponding FDC files, which I've been unable to decipher.

The strange thing is that
I 'm not explicitly calling the rollback, the only condition in which I rollback, is not happening, and even then I have extensive logging to a file, and I don't see those messages logged.

MQ seems to be rolling these messages back unbeknownst to me. And it's not that my application is crashing or something so that MQ would rollback the transaction.
So, one of my thoughts is that maybe MQ is thinking that my application has died, whereas it hasn't. How does MQ know that the application has died so that it can roll back the transaction?

Abhi
Back to top
View user's profile Send private message
bduncan
PostPosted: Tue Aug 06, 2002 3:24 pm    Post subject: Reply with quote

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

Abhi,
Trust me, I've been doing this a long enough time to know that MQSeries is not preemtively rolling back your messages. There are only a few conditions in which the queue manager will issue a rollback:
1) The application issues an MQBACK
2) You are using XA, and something other than MQSeries is acting as the resource transaction coordinator, like DB2 for instance, and auto-rollback is turned on for that system, and there is some sort of failure. This causes the other system to automatically roll back, informing other systems participating in the unit of work to roll back, including MQSeries.
3) The application issues an MQCLOSE or MQDISC on the queue manager while the unit of work is still pending.
4) The application loses the queue or queue manager handle. This can happen quite easily in languages like C where your handle is actually a pointer.
5) Similar to 4, but your application actually dies for some reason.
6) You have exceeded your unit of work. This happens when you do so many MQGETs and/or MQPUTs under a single syncpoint (without ever issuing an MQCMIT) that your unit of work exceeds the resources that MQSeries has set aside for it.

Please post the first 100 lines or so of one of these FDC files. Preferably the first one that gets generated once the duplicates start occuring. We might be able to decipher it for you.

In the meantime, perhaps you should consider moving your application to a single threaded model, at least for testing purposes, and see if this alleviates your problem.

Also, are you issuing the MQCONN and/or MQOPEN before threading off the child processes? If so, that will cause all kinds of problems.
_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator
Back to top
View user's profile Send private message Visit poster's website AIM Address
abegerho
PostPosted: Tue Aug 06, 2002 5:27 pm    Post subject: Reply with quote

Newbie

Joined: 26 Jul 2002
Posts: 5

Brandon, thanks for your thoughtful post.

Let me clarify a few things.

a) I was wrong about the FDC files being produced. They're being produced for another unrelated reason. Sorry about the misinformation.

b) I have not configured MQSeries to act as an XA Transaction Manager. No switch load stuff done.

c) I only issue an MQBACK in one place in my application, it is never being called. The next line of code after the MQBACK is a print statement that never gets called. My business logic is such that I could probably take this out of my application as well, and no harm would be done, as long as MQ rolls back if my application crashes.

d) I don't do too much under syncpoint.
My algorithm is the following.
1) Get the message from MQ under syncpoint
2) Do a whole lot of stuff
3) Issue a commit.
4) Get next message

The only reason I do stuff under syncpoint is so that in case of an application crash the message is available upon restart and does not get lost. My application can deal with all other cases.

I am also explicitly logging to a file every time I do an MQCMIT, and the message that I've committed (same mesg id) pops up again.
Happens only with MQ Clustering.

e) If I configure the number of workers I have to 1, I cannot reproduce the problem.

f) I'm doing all the MQ related processing in the worker processes, so there's no MQCONN/MQOPEN before the fork.

g) I am flummoxed.

Regards,
Abhi
Back to top
View user's profile Send private message
bduncan
PostPosted: Tue Aug 06, 2002 10:06 pm    Post subject: Reply with quote

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

Abhi,
I agree with you that your call to MQBACK isn't being called. But that doesn't mean the queue manager is rolling back the message without proper reason.
Clearly the fact that you cannot reproduce this problem when you are only running a single consumer on the queue seems to indicate that something faulty is happening within your application due to the forking of child processes. Another expirement to try is running multiple single threaded copies of the application against the queue. Are these able to run simultaneously without any trouble? I'm willing to bet they will. While you mentioned your MQ-related calls are being done within each child thread, is there any possibility some variable, object, pointer, etc., that was initialized in the parent process is being altered by any of the child threads?
One other thing that I cannot fathom is you mention you are logging the commits, and that you apparently see a particular message being committed, and then you see that same message on the queue again. Yet that message has a backout count > 0. These two conditions simply cannot occur together. The moment the message is rolled back, whether by you or the queue manager, and we know this is happening because of the backout count, any MQCMIT on that message should fail. While your application may be logging the fact that the MQCMIT is being called, this doesn't mean the MQCMIT succeeded. Are you checking the CompCode and Reason for each MQ call you make? Are all the CompCode's 0? I am willing to bet that some of your MQCMITs are failing (because the message has already been rolled back) and perhaps the return codes on the MQCMIT aren't being checked.
_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator
Back to top
View user's profile Send private message Visit poster's website AIM Address
abegerho
PostPosted: Thu Aug 08, 2002 4:53 pm    Post subject: Reply with quote

Newbie

Joined: 26 Jul 2002
Posts: 5

Brandon,

I have checked/rechecked, got someone else to check and recheck all calls to MQSeries C++ APIs that I make from my application. I always check the completion code and if non zero I log the reason code.

Now I'm beginning to get suspicious of the C++ library that MQ provides on top of the C API.

A few more observations.

a) I am not doing anything to do with MQ in my parent process, nothing. Yes AIX probably does load the library (which provides MQs C++ APIs) when my parent process starts up.

b) I am religiously checking for completion codes.

c) Sometimes I see a message made available to other workers after I've gotten it from the mq queue but have not committed or backed out or anything. This message also has a backout count of 1.
d) There certainly is some type of a race condition as the problem only happens when I run multiple processes, yesterday I tried with 2 workers, for 30 minutes or so, and I couldn't reproduce it. With 6 I can easily reproduce it.

The application does work fine when multiple single process instances of the application are invoked separately.

I'm working with MQ Support to see if they can help me out as well.
Back to top
View user's profile Send private message
simon.starkie
PostPosted: Thu Aug 22, 2002 11:30 am    Post subject: Are you manipulating the GMO options for MQGET? Reply with quote

Disciple

Joined: 24 Mar 2002
Posts: 180

One possibility:
Is your program changing the GMO options from MQOO_INPUT_SHARED
to MQOO_BROWSE? If so, this could result in the message not being consumed by one thread with the result that it is available to another thread. In other words, you may not be "destructively reading" the message and so it remains on the queue for subsequent get operations. The result would be the apparent duplicate message all the way down to the MessageId in the MD indicated previously.
Another possibility:
Remove the Cluster name from the Qlocal instances and point your program at them instead of the Qalias. Leave the Qalias with the Cluster name. The idea here is to allow the sending app to continue workload balancing because it is putting messages to the Qalias which is visible in the Cluster. But the receiving app would only be pointing at a Qlocal that does not participate in the Cluster, thereby removing MQ Clusting completely.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » Clustering » MQ Clustering once-only delivery failing
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.