Author |
Message
|
LouML |
Posted: Wed Jan 23, 2008 6:33 am Post subject: AMQ7469 and lost messages? |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
We're running 5.3.12 (I know, I know) on a Solaris box with linear logging. We have sender/receiver channels to an IBM mainframe system. Yesterday, we started to receive AMQ7469 messages at 15:36 and by 15:39 the mainframe system lost connectivity for about 12 minutes. Their Queue Managers did not come down but all channels to and from the mainframe were disconnected. By 15:49 the AMQ7469's stopped and by 15:51 connectivity was restored and the channels came back up. I'm still waiting to hear from the mainframe people as to what caused the outage.
Although it was not noticed until much later, the mainframe reported missing over 32,000 msgs. Our transmit queue does not have default persistence on and I have not yet heard from the app owners as to whether or not they set persistence to on.
Now I'm left with a bunch of questions -
1 - Since the AMQ7469's started before the outage, could that have been a cause of the outage? Or is it more likely a symptom of the pending outage?
2 - Is it possible that the messages could have been lost because of persistence not being set (even though both QM's remained up the entire time)?
3 - If messages were truly lost wouldn't we have received a bunch of out of sequence messages?
4 - Are my logs too small (Pri 5, sec 3, pages 4096)?
My guess is that the apps are not handling the log rollback properly. The messages are being rolled back but the app is not aware and therefore not resending.
Code: |
bash-2.05$ dspmqver
Name: WebSphere MQ
Version: 530.12 CSD12
CMVC level: p530-12-L051208
BuildType: IKAP - (Production)
bash-2.05$
-------------------------------------------------------------------------------
01/22/08 15:48:04
AMQ7469: Transactions rolled back to release log space.
EXPLANATION:
The log space for the queue manager is becoming full. One or more long-running
transactions have been rolled back to release log space so that the queue
manager can continue to process requests.
ACTION:
Try to ensure that the duration of your transactions is not excessive. Consider
increasing the size of the log to allow transactions to last longer before the
log starts to become full.
-------------------------------------------------------------------------------
Log:
LogPrimaryFiles=5
LogSecondaryFiles=3
LogFilePages=4096
LogType=LINEAR
LogBufferPages=512
bash-2.05$ echo "dis chl(CH.US.TO..THEM)" | runmqsc QM.US.01
5724-B41 (C) Copyright IBM Corp. 1994, 2002. ALL RIGHTS RESERVED.
Starting MQSC for queue manager QM.US.01.
1 : dis chl(CH.US.TO..THEM)
AMQ8414: Display Channel details.
CHANNEL(CH.US.TO.THEM) CHLTYPE(SDR)
TRPTYPE(TCP) DESCR( )
XMITQ(QM.US.01) MCANAME( )
MODENAME( ) TPNAME( )
BATCHSZ(50) DISCINT(6000)
SHORTRTY(10) SHORTTMR(60)
LONGRTY(999999999) LONGTMR(1200)
SCYEXIT( ) SEQWRAP(999999999)
MAXMSGL(4194304) CONVERT(YES)
SCYDATA( ) USERID( )
PASSWORD( ) MCATYPE(PROCESS)
CONNAME(mainframe(1414)) HBINT(300)
BATCHINT(0) NPMSPEED(FAST)
SSLCIPH( ) BATCHHB(0)
LOCLADDR( ) KAINT(AUTO)
MCAUSER( ) ALTDATE(2007-10-04)
ALTTIME(08.47.36) SSLPEER()
MSGEXIT( )
SENDEXIT( )
RCVEXIT( )
MSGDATA( )
SENDDATA( )
RCVDATA( )
One MQSC command read.
No commands have a syntax error.
All valid MQSC commands were processed.
bash-2.05$ |
Last edited by LouML on Wed Jan 23, 2008 7:17 am; edited 1 time in total |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jan 23, 2008 7:17 am Post subject: Re: AMQ7469 and lost messages? |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
LouML wrote: |
1 - Since the AMQ7469's started before the outage, could that have been a cause of the outage? Or is it more likely a symptom of the pending outage? |
I wouldn't have thought so, but I've been wrong before. "Disconnected" is not a channel state - do you mean "retrying", indicating that the MCA on the non-mainframe end (where the queue manager was out of log) were not running?
LouML wrote: |
2 - Is it possible that the messages could have been lost because of persistence not being set (even though both QM's remained up the entire time)? |
Yes. Non-persistent messages can be lost at any time if the queue manager has an issue, especially if it's rolling transactions back.
LouML wrote: |
3 - If messages were truly lost wouldn't we have received a bunch of out of sequence messages? |
No. If the messages were received by the MCA, passed to the queue manager then rolled back by the queue manager, the MCA sequence numbers would match. This is why I suspect the channels went into retry, as they negociated on which message was next. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
LouML |
Posted: Wed Jan 23, 2008 7:26 am Post subject: Re: AMQ7469 and lost messages? |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
Vitor wrote: |
I wouldn't have thought so, but I've been wrong before. "Disconnected" is not a channel state - do you mean "retrying", indicating that the MCA on the non-mainframe end (where the queue manager was out of log) were not running? |
Thanks Vitor
I'd edited my post as you were replying. I added the following as well:
4 - Are my logs too small (Pri 5, sec 3, pages 4096)?
My guess is that the apps are not handling the log rollback properly. The messages are being rolled back but the app is not aware and therefore not resending.
as for the channels, they were binding |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jan 23, 2008 7:31 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
If transactions are getting rolled back, either your logs are too small or the transaction is too big.
It's generally "safest", from an MQ admin point of view, to just make the logs as big as possible.
Then if someone uses them up, you can say "make your transactions smaller. My hands are tied" _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
LouML |
Posted: Wed Jan 23, 2008 7:35 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
jefflowrey wrote: |
If transactions are getting rolled back, either your logs are too small or the transaction is too big.
It's generally "safest", from an MQ admin point of view, to just make the logs as big as possible.
Then if someone uses them up, you can say "make your transactions smaller. My hands are tied" |
Am I correct in saying that if messages are getting rolled back and then lost then the app is not handling the rollback properly?
Edited to remove transaction question - already answered
Last edited by LouML on Wed Jan 23, 2008 7:37 am; edited 1 time in total |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jan 23, 2008 7:36 am Post subject: Re: AMQ7469 and lost messages? |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
LouML wrote: |
4 - Are my logs too small (Pri 5, sec 3, pages 4096)? |
For a live system carrying thousands of messages and transactions? Hell yes!!
(Sounds like the default queue manager log values to me)
Your log should be large enough to handle maximum-possible-number-of-message-in-a-UOW times maximum-possible-simultanious-UOWs times 50%. Plus a bit to allow for any other odds and ends going on.
LouML wrote: |
My guess is that the apps are not handling the log rollback properly. The messages are being rolled back but the app is not aware and therefore not resending. |
How could the apps know? Especially on the sending side? From their point of view they did a put and got on with their lives.
If there's no mechanism by which the sending application can determine the non-arrival of non-repeatable data you should use persistent messages. This is also the best way of handling non-repeatable data otherwise you end up in this spiral of who-sent-what-when with acknowldgements. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jan 23, 2008 7:37 am Post subject: Re: AMQ7469 and lost messages? |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
LouML wrote: |
4 - Are my logs too small (Pri 5, sec 3, pages 4096)?
|
The shorter answer is of course "Yes - you got a message saying they were full!"
 _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jan 23, 2008 7:48 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
LouML wrote: |
Am I correct in saying that if messages are getting rolled back and then lost then the app is not handling the rollback properly? |
It depends on which app had a transaction rolled back on it...
If it was the MCA, then it's likely handling the rollback properly - i.e. discarding the messages if non-persistent. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
LouML |
Posted: Wed Jan 23, 2008 8:00 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
So, it looks like my 'to do' list is:
1 - Create bigger logs
2 - Either:
A - Set Default Persistence on the xmit queue to Yes
B - Get the developers to set persistence to on when putting messages
C - Both
Thanks all! |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jan 23, 2008 8:10 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
LouML wrote: |
A - Set Default Persistence on the xmit queue to Yes
B - Get the developers to set persistence to on when putting messages
C - Both
|
You should set persistence (on or off) on the queue the appliciation is writing to, and be aware that it's the default value. If the application sets persistence itself, your change will have no effect.
There have been some interesting debates in here about who's responsibility is the setting of persistence: the application programmer's (with their greater knowledge of message content & repeatability); the MQ admin's (who's responsibility it is to refute the cry of "MQ lost my message"). _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
LouML |
Posted: Wed Jan 23, 2008 8:20 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
Vitor wrote: |
You should set persistence (on or off) on the queue the appliciation is writing to, and be aware that it's the default value. If the application sets persistence itself, your change will have no effect. |
I know that the persistence on the queue is just a default and can be overridden by the app. However, I thought that (in the absence of any app persistence) persistence would need to be set on all queues in the chain. The queue being put to, the xmit queue, the local queue on the receiving end, etc... to insure true persistence.
Vitor wrote: |
There have been some interesting debates in here about who's responsibility is the setting of persistence: the application programmer's (with their greater knowledge of message content & repeatability); the MQ admin's (who's responsibility it is to refute the cry of "MQ lost my message"). |
Of course I'm on the side of the programmers being responsible. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jan 23, 2008 8:28 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
LouML wrote: |
I thought that (in the absence of any app persistence) persistence would need to be set on all queues in the chain. |
Persistence is a property of the message not the queue. If a message is persistent at put, it's persistent through it's journey. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
LouML |
Posted: Wed Jan 23, 2008 8:39 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
Vitor wrote: |
LouML wrote: |
I thought that (in the absence of any app persistence) persistence would need to be set on all queues in the chain. |
Persistence is a property of the message not the queue. If a message is persistent at put, it's persistent through it's journey. |
Agreed - that's why I asked in the absence of any app persistence - meaning if the app does not set persistence (just off the phone with the programmer and that appears to be the case).
So, if the message is not persistent at put, then it can be lost if the queue (or any queue along the way) does not have default persistence set when the problem hits. |
|
Back to top |
|
 |
Toronto_MQ |
Posted: Wed Jan 23, 2008 8:52 am Post subject: |
|
|
 Master
Joined: 10 Jul 2002 Posts: 263 Location: read my name
|
LouML wrote: |
Vitor wrote: |
LouML wrote: |
I thought that (in the absence of any app persistence) persistence would need to be set on all queues in the chain. |
Persistence is a property of the message not the queue. If a message is persistent at put, it's persistent through it's journey. |
Agreed - that's why I asked in the absence of any app persistence - meaning if the app does not set persistence (just off the phone with the programmer and that appears to be the case).
So, if the message is not persistent at put, then it can be lost if the queue (or any queue along the way) does not have default persistence set when the problem hits. |
The "or any queue along the way" is the sticking point here, as Vitor has mentioned. It really doesn't matter what the default persistence is set to on any other queue in the chain. Only at the time of the initial put, be it by the application or via the default persistence of the original queue, and only that queue.
Steve |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jan 23, 2008 8:58 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Right, the real thing is.
DEFPSIST is just a suggestion.
The only thing that controls whether a message is persistent or not is the MQMD. If the MQMD says it's a non-persistent message, then it's a non-persistent message, regardless of what queue it is on. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
|