MQSeries.net :: View topic - MQJMS2007 / mqrc: 2009 -error from .send(), but msg in Q

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Java / JMS » MQJMS2007 / mqrc: 2009 -error from .send(), but msg in Q

Goto page 1, 2 Next

MQJMS2007 / mqrc: 2009 -error from .send(), but msg in Q

« View previous topic :: View next topic »

Author

Message

Posted: Wed Jan 18, 2006 7:25 am Post subject: MQJMS2007 / mqrc: 2009 -error from .send(), but msg in Q

Apprentice

Joined: 20 Dec 2001
Posts: 47

Hi

I have strange problem in production which has happened now 3 times, the JMS client app. tries to put JMS TextMessage in queue (msg size about 2 meg.) but when putting msg QueueSender.send() throws JMSException of "MQJMS2007: failed to send message to MQ queue" with reason code 2009, nothing alarming so far because the JMS client app. will try to re-create the JMS connection and retry the sending.

The disturbing problem comes in that apparently it seems that sometimes even when .send() -methods has thrown MQJMS2007 -JMSException indicating that the message could not be sent, the message has still been putted in queue. This combined with retry logic of JMS client app. will result message put in queue twice

Here is java code snippet from the JMS client app. that puts the message in queue.

Code:

...
QueueConnection conn = null;
QueueSession sess = null;
QueueSender sender = null;
boolean wasMessageSentSuccesfully = false;

// getQueueConnFactory() returns MQQueueConnectionFactory using MQSimpleConnectionManager.
QueueConnectionFactory factory = getQueueConnFactory();
conn = factory.createQueueConnection();
sess = conn.createQueueSession(false, QueueSession.AUTO_ACKNOWLEDGE);
TextMessage tMsg = sess.createTextMessage(message);

// getQueue() returns Queue object
sender = sess.createSender(getQueue());
sender.send(tMsg);

// This is not in error situations never reached, because MQJMS2007 â€“ JMSException
// The wasMessageSentSuccesfully boolean is used because sometimes JMSException with 2009 comes when later closing the JMS connection objects and retry sending is not valid at that point anymore.
wasMessageSentSuccesfully = true;
...

The WMQ versio on (AIX) server is 5.3 (fix pack 10) and JMS/MQ jars on (Windows) JMS client are also from WMQ fix pack 10 version.

Any ideas what might be the problem ?

Comments about how likely it can be that MQ/JMS provider would throw "failed to send" Exception and still has put the message in queue ? (it just feels even for myself still little bit absurd, but possible choices are starting to run out ...)

I appreciate your help,
aq

bower5932

Posted: Wed Jan 18, 2006 8:57 am Post subject:

Jedi Knight

Joined: 27 Aug 2001
Posts: 3023
Location: Dallas, TX, USA

My guess would be that the message makes it to the queue before the connection breaks. I think you want to be explicit about your unit of work and do the commit yourself. I'd also suggest that you look for an error on the server side - either in the amqerrxx.log files or in the form of an *.fdc file.

Posted: Wed Jan 18, 2006 11:49 pm Post subject:

Apprentice

Joined: 20 Dec 2001
Posts: 47

Quote:

My guess would be that the message makes it to the queue before the connection breaks. I think you want to be explicit about your unit of work and do the commit yourself.

Yes, I little bit suspected that also, but I also thought (or hoped) that the MQ/JMS provider would "somehow" handle the situation by itself

Quote:

suggest that you look for an error on the server side - either in the amqerrxx.log files or in the form of an *.fdc file

No FDC files were generated at error times. In AMQERR -log there is some entries, but they apparently relate only to that 2009 MQRC_CONNECTION_BROKEN:
------------------------
AMQ9208: Error on receive from host clnt_2286 (11.32.8.110).

EXPLANATION:
An error occurred receiving data from clnt_2286 (11.32.8.110) over TCP/IP. This
may be due to a communications failure.
ACTION:
The return code from the TCP/IP (read) call was 73 (X'49'). Record these values
and tell the systems administrator.
------------------------
The network conditions in production env. are not always the best, so I assume the 2009 errors are result of that, and that is also reason for retry logic implementation in JMS client app.

I would love to run jms trace for JMS client app. but the error I described in my previous post doesnt seem to have any regular interval when it is about to appear (now it has just happened 3 times inside of month).

mvic

Posted: Thu Jan 19, 2006 2:39 am Post subject: Re: MQJMS2007 / mqrc: 2009 -error from .send(), but msg in Q

Jedi

Joined: 09 Mar 2004
Posts: 2080

aq wrote:

but when putting msg QueueSender.send() throws JMSException of "MQJMS2007: failed to send message to MQ queue" with reason code 2009, nothing alarming so far because the JMS client app. will try to re-create the JMS connection and retry the sending.

The disturbing problem comes in that apparently it seems that sometimes even when .send() -methods has thrown MQJMS2007 -JMSException indicating that the message could not be sent, the message has still been putted in queue. This combined with retry logic of JMS client app. will result message put in queue twice

2009 from MQPUT() or producer.send(), etc., is a classic can't-fix problem. If the client/server connection has gone down before the queue manager can feed back status to the client, then the MQ API in your client can't tell your client app code what has happened. All it can say is that the connection got broken.

If it really matters to you, you have to work around it yourself with a changed design. Here's my idea for a design. NOTE I have not tried it in practice - it's just a theoretical design.

Create an "audit" queue, on the same queue manager as the app's queue. Let's call it auditq. In the app, use a transacted session and, instead of one producer.send() do two producer.send()s - one to each queue. Then commit the session.

If the connection breaks at any stage then either (a) both messages are placed on their respective queues, or (b) neither message is put at all.

If the connection breaks, the administrator (or the putting app, in its retry logic) can then go to the auditq to find out whether the messages were actually put or not, and decide on that basis whether the app needs to retry.

The getting app(s) can continue to use the same single queue as before, but there would need to be some end-of-day processing (or more frequent than this, if there is a high throughput) to remove the day's redundant messages from the auditq, to avoid it filling up. Essentially, when the audit messages on the auditq are no longer needed, they could be removed.

I hope this helps.

jefflowrey

Posted: Thu Jan 19, 2006 5:23 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

That's approaching the problem backwards, mvic. The important information should be the information you have, not the information that you don't have.

If a client application loses it's connection to MQ, it should have some retry logic that will cover a reasonble (according to the business case and the enviroment) amount of time, and then it should halt.

All enterprise level applications should be able to log all of the important data about any event that goes on in their system, to the level of detail that the application can be restarted (either manually or automatically) and continue processessing from the same point.

Back to the original problem - I remember some discussion about when a client does an automatic commit on connection drop and when it doesn't - and it could be that AQ has created the specific conditions with his configuration of the JMS Provider to cause an autocommit.

Or it could be a bug in FP10 of WebSphere MQ.
_________________
I am *not* the model of the modern major general.

mvic

Posted: Thu Jan 19, 2006 8:06 am Post subject:

Jedi

Joined: 09 Mar 2004
Posts: 2080

jefflowrey wrote:

Back to the original problem - I remember some discussion about when a client does an automatic commit on connection drop and when it doesn't - and it could be that AQ has created the specific conditions with his configuration of the JMS Provider to cause an autocommit.

The point is, how does the client app know whether its connection was lost just before successful acceptance (and execution) of the put by the queue manager, or just after successful acceptance (and execution) of the put by the queue manager. It doesn't know because all the app has is 2009, with no qualifying code saying success or failure.

Quote:

Or it could be a bug in FP10 of WebSphere MQ.

I have heard and answered similar questions more than once in the past, going back over a few years. That is, if I understand the question correctly. ( Which I think I do

)

I think this (ie. the uncertainty that comes with a 2009) is a classic "can't fix" problem inherent in client / server messaging or transactions. However it can be worked around, by using an audit trail, as long as the audit trail is sure to be in-step with the real data.

If duplication must be avoided after a 2009, the app needs to come back and check with the server whether the work was received or not. In this, I agree broadly with your other remarks about the required app logic and app "logging".

The only problem with this is that MQ doesn't have a built-in facility to allow apps to check an audit trail (ie. the app "logging" you refer to). So my design was intended to add an audit trail facility on the server - protected by MQ's transactional rules - to provide the information if and when the app needs it.

(NB. I use the phrase 'app "logging"' to distinguish from MQ logging in /var/mqm/log).

jefflowrey

Posted: Thu Jan 19, 2006 12:43 pm Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Your auditing solution is still wrong, because it is still recording only the information that it knows to be useless for solving this problem.

And there's no way to guarantee that your transacted session won't drop the connection between the first send and the second send - so you've really just moved the problem around.

The real problem is that, faced with a situation where the client application doesn't know if a message has been delivered or not, the client application decides to resend a message that is in doubt.

If, instead, the application were to put only those messages that it knew were in doubt onto a different queue (because the code received an exception), then nobody would be guessing which messages needed to be dealt with manually. And nobody would be looking through a record of the hundreds or thousands of already sent messages trying to find the one that should be there but isn't.
_________________
I am *not* the model of the modern major general.

wschutz

Posted: Thu Jan 19, 2006 12:54 pm Post subject:

Jedi Knight

Joined: 02 Jun 2005
Posts: 3316
Location: IBM (retired)

Quote:

The network conditions in production env. are not always the best

This is a very old debate (I think we had this on the vienna list server 6, 7 or 8 years ago

)... anyways ...

This is the classic example of why you should have a local queue manager and not use the MQ client. There was a lot of work put into "once and once only" delivery of messages between qmgrs (ie, msg seq number in the channel FAPs).

You might argue that you can still get 2009's in local applications, but I suspect the chance of that is a couple of magnitudes less .....
_________________
-wayne

PeterPotkay

Posted: Thu Jan 19, 2006 1:12 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

jefflowrey wrote:

I like this idea better, but you could then get a 2009 on that put!

I know, I know, not likely, but if these messages are medical instructions telling the doctor which leg to amputate.......

aq wrote:

The network conditions in production env. are not always the best, so I assume the 2009 errors are result of that, and that is also reason for retry logic implementation in JMS client app.

And there is the key. MQ Client is not a good solution for flaky networks. If your company won't shell out the bucks for a robust network, then they have shell out $$$ for a MQ server license so you can have QM to QM communication, and let the MCA's deal with the flaky network. They do a good job!
_________________
Peter Potkay
Keep Calm and MQ On

jefflowrey

Posted: Thu Jan 19, 2006 1:26 pm Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

PeterPotkay wrote:

I like this idea better, but you could then get a 2009 on that put!

I know, I know, not likely, but if these messages are medical instructions telling the doctor which leg to amputate.......

Right. Which is why the message should actually be put somewhere else instead of a queue (log4J is good for something, right?) - but that was not quite as important to my point.
_________________
I am *not* the model of the modern major general.

Posted: Thu Jan 19, 2006 10:00 pm Post subject:

Apprentice

Joined: 20 Dec 2001
Posts: 47

Quote:

aq: The network conditions in production env. are not always the best

PeterPotkay: This is the classic example of why you should have a local queue manager and not use the MQ client.
MQ Client is not a good solution for flaky networks.

Well the network is not totally "flaky" and made of tin wire

... lets just say that there is some challenges (short breaks or something like that might occur from time to time, router resets etc.)

Yes, the MQ server would be ideal but ... I failed to mention in my original post, there is not just one or two of those JMS client app. installations but well over 1000 (not in servers, but in workstations), so my boss would propably practice some amputations to me if I would suggest to buy (and manage) MQ servers for each of them

Okay, back to the problem. So if we are to be find that there is no fix for this in MQ/JMS provider part (and its not a bug in this JMS client app.) and this is like mvic wrote "can't fix" problem, wouldn't the simplest solution then be start using explicit transactions/commit on client side (QueueSession.AUTO_ACKNOWLEDGE --> QueueSession.CLIENT_ACKNOWLEDGE) like bower in first reply suggested ? What might be possible downsides of this (indoubt messages ?)

mvic

Posted: Fri Jan 20, 2006 6:11 am Post subject:

Jedi

Joined: 09 Mar 2004
Posts: 2080

Preamble: I don't have a lot at stake here - I am not defending my life's work, only 1/2 hour's work

If you don't like the solution that's OK with me. But I may as well defend as long as I have a few spare minutes...

jefflowrey wrote:

Your auditing solution is still wrong, because it is still recording only the information that it knows to be useless for solving this problem.

Response: the auditq must record all messages until a point of consistency is established - eg. end of day - when all audit records can be (a) archived or (b) discarded, as per the business rules.

Until a point of consistency is established, as far as the server is concerned any single message can potentially become indoubt, because network problems can take place at any time during the MQPUT conversation with the client. (Specifically, in this case, network problems can occur at any time after a successful MQPUT to the queue, before or while the MQPUT response is composed and sent from server to client).

Therefore all audit records need to be stored until the point of consistency is declared - eg. at end of day.

Quote:

And there's no way to guarantee that your transacted session won't drop the connection between the first send and the second send - so you've really just moved the problem around.

In this case the UOW will rollback, leaving no message on the app queue, and no message on the auditq.

In general, when the app comes back after the network outage, it checks on the auditq for the presence / absence of a message indicating the UOW's success (auditq message contents and matching rules unspecified - but could use CorrelId with a unique token denoting the piece of work, if CorrelId is not in use for anything else). If it finds the message, no need for the app to re-send. If it doesn't, then resend.

NB. Matching rules for auditq messages would have to be reliable - ie. no false positives, no false negatives. Unreliable matching rules could/would lead to the same problem (message duplication, or - worse? - dropped messages) by a different route!!

Quote:

If, instead, the application were to put only those messages that it knew were in doubt onto a different queue (because the code received an exception), then nobody would be guessing which messages needed to be dealt with manually. And nobody would be looking through a record of the hundreds or thousands of already sent messages trying to find the one that should be there but isn't.

OK, but how would you find out, post-facto, whether the message arrived successfully at the server or not? An audit trail has to exist in order to support such a query. That's what I think I have proposed.

Thanks for discussing

mvic

Posted: Fri Jan 20, 2006 6:22 am Post subject:

Jedi

Joined: 09 Mar 2004
Posts: 2080

PeterPotkay wrote:

jefflowrey wrote:

I like this idea better, but you could then get a 2009 on that put!

I know, I know, not likely

A 2009 from the put of an "exception" message is likely if the first cause is an extended network failure which caused the first put to fail. My experience of network failures is that they can be for an hour or more at a time. YMMV

Even if you do manage to put an "exception" message this still begs the question of how reconciliation will be done. For this an audit trail is needed.

r2504

Posted: Fri Jan 20, 2006 6:40 am Post subject:

Novice

Joined: 05 Mar 2004
Posts: 22

Also check if the CCSID is specified. I also got a 2009 and after specifying the CCSID everything worked fine.

jefflowrey

Posted: Fri Jan 20, 2006 8:16 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

I'm not dismissing the idea of an audit trail in general. Although I would view it as more of a message archive than an audit trail. In an enterprise, and particularly in an ESB, a message archive service is almost the first thing I would implement.

It just doesn't do anything in particular to solve this problem - how do I know if a message has been sent when I get a 2009 on the PUT?

In your original proposal, a message that may or may not have been sent would either show up in two places, or none. And it still didn't allow the application to know that the message got sent or not. And we agreed that this was not really possible for the application to know.

My point was this. An application attempts to put a message. The application receives an error code that indicates that the message might not have been sent. Assuming that a duplicated message is a problem - which it appears to be for the original poster - then the application should not attempt to send the message again at all. It should set the message aside in a particular place. Then a manual check can be done to determine if the messages that might not have been processed were processed.

An audit trail or message archive could be used as a point of reference for determining if the message was processed. Or message IDs or business identifiers in the message could be compared against down stream transactional records or etc.

But without a record of the message that may or may not have been processed - what should someone look for in the audit trail/message archive? Messages that aren't there? Messages that could be there twice?

And, yes, I agree. Most network failures are not transitory. And so using a queue for recording messages that are in doubt is not the best option. But as I said, it was less relevant to the point I was trying to make.
_________________
I am *not* the model of the modern major general.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » IBM MQ Java / JMS » MQJMS2007 / mqrc: 2009 -error from .send(), but msg in Q

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP