|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
will reseting an indoubt channel risk loosing the msg? |
« View previous topic :: View next topic » |
Author |
Message
|
flaufer |
Posted: Mon Sep 27, 2010 12:26 am Post subject: will reseting an indoubt channel risk loosing the msg? |
|
|
 Acolyte
Joined: 08 Dec 2004 Posts: 59
|
Folks,
I've tried finding some info about my question... so far without luck...
This is my problem:
If I have a retrying channel with an indoubt UoW, I typically just RESOLVE the indoubt situation with either backout or commit (depends, according to docu).
Now I'm thinking about just RESETTING the channel and would like to know what happens to the indoubt UoW. I suppose the reset chl command will just reset the sequence number of the sender and let the receiver know (at next connection attempt) that it has been resetted. But what happens to the UoW which is indoubt?
I'm asking this because I've found that this seems to be a usual habbit to reset a channel when it's indoubt/retrying instead of resolving and so far, nobody has complained about lost msgs.
Felix |
|
Back to top |
|
 |
Mr Butcher |
Posted: Mon Sep 27, 2010 1:21 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
if you have a sequence number problem reset is your friend
if you have an unresolved uow resolve is what you need.
if you have messages on the fly, the channel may indicate indoubt. this is a temporary and "normal" state and may also be caused by other problems.
so investigate if there is really an open uow or if you have any other problem that causes the channel uow to be indoubt. solve those other problems first and start channel and in mose cases the indoubt is gone.
if its a "real" indoubt uow, resetting the channel will not help.
if your channel sequence number gets out of sync frequently, you should investiigate why and not lnly reset the channel and thats it. _________________ Regards, Butcher |
|
Back to top |
|
 |
flaufer |
Posted: Wed Sep 29, 2010 12:45 am Post subject: how come? |
|
|
 Acolyte
Joined: 08 Dec 2004 Posts: 59
|
Mr Butcher wrote: |
if you have a sequence number problem reset is your friend
if you have an unresolved uow resolve is what you need.
if your channel sequence number gets out of sync frequently, you should investiigate why and not lnly reset the channel and thats it. |
Understood so far.
1. We frequently receive channel sequence number errors (thousands of times in a gateway machine in a week) and I suspect some WAN issues here (only to a certain group if remote queue managers we experience the trouble). Here I don't get any additional errors (like things pointing to the network, at least not in the MQ log) as I would expect when a connection is lost or bad data is arriving on a channel.
2. in a number of cases (few, but enough to trigger management) the automatic recovery seems not to be working (channel goes to retrying and stays there) and manual intervention is required. Only RESET channel is performed in this case.
Now I don't yet fully understand why a sequence number mismatch error is reported to the error log and in some cases it does require manual intervention and in some cases it does not. Never (so far) is a resolve chl used in these cases, only reset which resolves the issue (until next time).
Cheers,
Felix |
|
Back to top |
|
 |
Mr Butcher |
Posted: Wed Sep 29, 2010 2:06 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
if (temporary) channel errors do not resolve automatically, when both channel ends are up and running and all kind of possible restart recovery is completed, then this should be investigated.
if - in your case - the channel is out of sync without a unit of work being indoubt there must be a reason for this.
resetting the channel will enable you to continue to send or receive messages, but messages may be lost.
we sometimes have the situation, that the receiving customer end (window system) crashes just after telling out MQ to commit the batch of send messages. this is not hardened on the windows system. when it then crashes our sending channel end commits the batch of messages. after restart, the mq windows system does not know about this batch as it was not written to disk (something like that).
however, in that case there is no open uow and nothing to resolve, from mq point of view the reset is the only way to make the channel running again, but from the application point of view messages are missing (or may be missing). _________________ Regards, Butcher |
|
Back to top |
|
 |
flaufer |
Posted: Wed Sep 29, 2010 5:02 am Post subject: strange... |
|
|
 Acolyte
Joined: 08 Dec 2004 Posts: 59
|
Mr Butcher wrote: |
resetting the channel will enable you to continue to send or receive messages, but messages may be lost.
|
Ok, this was one of my first questions.. so there IS a risk of loosing messages if you just reset without clarifying whether a UoW is indoubt or not.
Mr Butcher wrote: |
if (temporary) channel errors do not resolve automatically, when both channel ends are up and running and all kind of possible restart recovery is completed, then this should be investigated.
if - in your case - the channel is out of sync without a unit of work being indoubt there must be a reason for this.
...
we sometimes have the situation, that the receiving customer end (window system) crashes just after telling out MQ to commit the batch of send messages. this is not hardened on the windows system. when it then crashes our sending channel end commits the batch of messages. after restart, the mq windows system does not know about this batch as it was not written to disk (something like that).
however, in that case there is no open uow and nothing to resolve, from mq point of view the reset is the only way to make the channel running again, but from the application point of view messages are missing (or may be missing). |
That sounds weird and to be honest I did not fully follow :)
Anyway... the AMQ9526 error log entries always mention two sequence numbers and the sender is always trying to send sequence 1 while the reciever is expecting a sequence number beyond 50000 or something alike. No change in architecture or setup is done.. probably some sort of bug.. I remember there was an issue fixed in 6.0.2.7 (we run 6.0.2.1) about this, need to check it up and verify that this is exactly what we're facing here.
Felix |
|
Back to top |
|
 |
Mr Butcher |
Posted: Wed Sep 29, 2010 5:49 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
someone expecting or sending sequence number 1 is always suspicious.
lets say you are sending 5473, but the receiver is expecting 5471, then something is wrong, message may be lsot or batch lost or whatever.
but if the receiver is expecting 1 then this looks like some kind of administration task has been performed
- reset of the receiver to 1
- delete / deifne of the receiver channel
- ... ?!?
their may be more tasks that cause a channel to be reset to 1. dont have them all at hand at the momend (or in mind).
so if one of the channel end sequence number is "1" regularily, try to find out why. there must be a reason (in most cases, or its a bug).
sometimes stuff like reset channel is put into whatever scripts (for backup or weekend maintenance or or). or objects are deleted / redefined automatically by scripts on a regular base or on specific actions .......
sequence number 1 is always suspicious ........ _________________ Regards, Butcher |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Sep 29, 2010 12:14 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
A big sequence number discrepancy (5400 vs 5320), without any admin intervention, is also most often a symptom of another problem. Depending on your level of MQ (and fixpack) you might happen to get multiple entries into the channel sync table. If that happens you can get random sequence errors and should open a PMR to resolve it.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Sep 29, 2010 2:40 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
RESOLVE and RESET are two entirely different things. They both can affect operation of a channel, but are not otherwise related.
RESETing message sequence numbers does just that. At channel attach, the two MCAs compare sequence fields values. The value specifies the sequence number of the next message to be sent/received. If the values match, the MCAs are happy, and the channel can start. If they don't match, the MCAs are unhappy, and the channel presumes that messages (the difference between the two sequence fields) are lost; and the channel does not start.
If RESET is done from the sending end, both sending- and receiving-sequence numbers will be initialized. When the next attempt to start the channel occurs, both MCAs will be happy, the channel will start, and messages will flow. No messages are lost.
Alternatively, if RESET is done at the receiving end, then only the receiving sequence number will be initialized. When the next attempt to start the channel occurs, the MCAs will be unhappy, and the channel will not start. No messages will flow. No messages are lost.
RESOLVE affects unresolved UofWs in in-flight batches. At channel attach, the two channel ends compare last-known status. If UofWs did not complete, the channel remains IN_DOUBT (about the message batch UofW). RESOLVE does not apply to non-persistent messages, since non-persistent messages are not transmitted in UofWs.
RESOLVE(COMMIT) tells the MCAs that it should consider the unresolved UofW to be committed. The sending MCA will commit messages out of the xmit queue; and the receiving end MCA will commit messages into their destination queues. Messages are now available for consuming apps. No messages are lost.
RESOLVE(BACKOUT) tells the MCAs that you consider the UofW not committed. Any messages put to destination queues will be backed out to the xmit queue, and will be re-sent. When re-sent, messages are available for consuming apps.
In normal operation of channels, no messages are lost. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
flaufer |
Posted: Thu Sep 30, 2010 6:20 am Post subject: so in the end.... |
|
|
 Acolyte
Joined: 08 Dec 2004 Posts: 59
|
bruce2359 wrote: |
RESOLVE and RESET are two entirely different things. They both can affect operation of a channel, but are not otherwise related.
In normal operation of channels, no messages are lost. |
Ok,, thanks for the explanation of the differences .. or ... the distinguish between resolve and reset... so a reset will never resolve an indoubt situation and the other way around a resolve will not reset the sequence number.
For us, it seems to be the "sequence # 1 problem"... this is what happens.. it's not just two high numbers that are apart probably just a batch# amount if digits... it's always one end expecting seqno=1.
I have a PMR open already for this... will see the outcome.
Thanks,
Felix |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Sep 30, 2010 6:28 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
On which channel end(s) do you see sequence number = 1?
If both ends, someone (or some app) did a RESET from sender end.
If receiving end, someone likely deleted and redefined the receiver channel; or someone (or some app) did a RESET from the receiving end of the channel.
If only on the sender end, someone likely deleted and redefined the sender channel. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
flaufer |
Posted: Thu Sep 30, 2010 6:36 am Post subject: no administrative action |
|
|
 Acolyte
Joined: 08 Dec 2004 Posts: 59
|
bruce2359 wrote: |
On which channel end(s) do you see sequence number = 1?
If both ends, someone (or some app) did a RESET from sender end.
If receiving end, someone likely deleted and redefined the receiver channel; or someone (or some app) did a RESET from the receiving end of the channel.
If only on the sender end, someone likely deleted and redefined the sender channel. |
I need to check which end if expecting seqno=1... I can however rule out that somebody is messing with config, e.g. resetting channel or redefining channels... it happens much too often for someone with a mouse behind it :)
Felix |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Sep 30, 2010 6:44 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
I've seen automated scripts do RESETs and DELETE/DEFINEs an an attempt to fix something. Any cron jobs (or equivalent)?
Might be interesting to run a search through the filesystem for these offending commands. That's how I discovered the cause of the same mysterious behavior at a client site. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Sep 30, 2010 12:28 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
I need to check which end if expecting seqno=1... |
Both ends expect the same sequence number.
When this happens again, display channel status (saved and/or current) at BOTH ends of the channel, to identify the sequence number at sender and receiver end; then post the results of both. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Sep 30, 2010 12:32 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
bruce2359 wrote: |
When this happens again, display channel status (saved and/or current) at BOTH ends of the channel, to identify the sequence number at sender and receiver end; then post the results of both. |
It might also be worth checking the create & modification dates of the channels in question, just to see if they're as far in the past as you think. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Sep 30, 2010 12:41 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
 _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|