|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
sender channel status: retrying |
« View previous topic :: View next topic » |
Author |
Message
|
jcv |
Posted: Wed Jul 18, 2007 1:27 am Post subject: sender channel status: retrying |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
Hi!
What's the benefit from a command "reset channel", setting sequence number to something (1)? While retry attributes of channels, both their intervals and counts are perfectly understandable, as parameters for automatic corrective action, I can't imagine what information do I obtain when/if I'm forced to reset this sequence, in order to be able to start the sender channel. It only makes an additional obstacle, for this is further corrective action which is neccesary to perform when sender and receiver get out of synchronization. Why does not channel initiator save this event in some event queue, and reset this sequence to 1, by himself? Can I loose some messages when I reset the sequence number to a not expected value, on any of the ends of the channel?
I don't find particulary usefull the fact, that xmitq "get" attribute goes to a disabled, when I issue "stop channel" command. Usually, I have to enable it back, immediately, to be able to start the channel again, after resetting sequence number on both ends. Obviously, I am missing something, if I perform this procedure regularly, when I encounter a channel in retrying state, because it makes no sense to me. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jul 18, 2007 1:37 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
If you're needing to reset the channel on a regular basis this indicates a problem, probably network related where the MCAs are losing handshake. Resetting the sequence numbers could easily cause loss of messages; why are you not resolving the channels instead?
The channel agent doesn't raise an event and keep going, because by definition it can't automatically make a determination about the message status and if anything's been lost. By design the channel doesn't go "ah - maybe I've lost some messages. Right, I'll raise an event, keep going and assume the sender can work out what's gone missing and resend them." This breaks the assured delivery model, where a message once accepted for delivery is always delivered.
(And before anyone says, I know that's a generialisation and there are exceptions. But each exception case needs specific configuration to allow it, by default messages are always delivered)
The xmitq is get disabling to prevent more messages being sent in a potentially unsyncronised state and compounding the problem. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Nigelg |
Posted: Wed Jul 18, 2007 2:40 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
Quote: |
when sender and receiver get out of synchronization |
This never happens in the normal course of channel operations. The two causes are that either the sender or the receiver channel is redefined, or that there is a bug such that the sequence number is not properly saved.
The first of these reasons is the likely cause in 99.99% of cases. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
jcv |
Posted: Wed Jul 18, 2007 8:23 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
Thank you Vitor and Nigel for your responses. As for the 0,01% case elimination, I have browsed through a "Fix list for WebSphere MQ Version 5.3", and didn't find any info on such an issue.
I have this version of involved qmgrs on AIX and Linux:
$ dspmqver
Name: WebSphere MQ
Version: 530.7 CSD07
CMVC level: p530-07-L040527
BuildType: IKAP - (Production)
I hope I can conlude this particular version is bug free regarding that matter.
As for the real need to reset the channel, instead of resolving it, Vitor is right, that was the point I was missing, as a newbie. I have obviously overlooked the chapter "in-doubt channels", explaining things like manual channel resynchronization. I didn't check too much things like sequence number and LUWID before resetting the channel. So, O.K., I wasn't using proper corrective action in (probably) most of the cases, leading to a potential loss of data.
In that case, I don't understand why isn't that procedure of proper resolving (described in that chapter) done automatically as a part of a "start channel" command? The scenario is like this. I have received warning about channel not running. Communication link failure is over. Channel is in between two long retry attempts.
Quote: |
In-doubt channel problems are usually resolved automatically. Even when communication is lost, and a channel is placed in doubt with a message batch at the sender whose receipt status is unknown, the situation is resolved when communication is re-established. Sequence number and LUWID records are kept for this purpose. The channel is in doubt until LUWID information has been exchanged, and only one batch of messages can be in doubt for the channel. |
So, if I sit and wait for another long retry attempt, something (channel initiator, channel agent, ...?) will resolve the channel automatically with proper action, commit or backout, depending on the fact if CURLUWID and LSTLUWID on both sides are equal or not. But if I try with commands like "stop channel" and "start channel", without waiting for another long retry attempt, the channel will return to "retrying" status, because I didn't resolve it manually, as a first step. Am I right? Do I have the whole picture now, or am I still missing something? Then I rephrase my question to: What's the benefit from a separate command "resolve channel", instead of being part of "start channel" command? |
|
Back to top |
|
 |
jcv |
Posted: Wed Jul 18, 2007 9:50 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
I tried to perform the proposed procedure, so I defined new channel, transferred some messages through it, and then stopped the channel.
Quote: |
Note that the saved status does not apply until at least one batch of messages has been transmitted on the channel. Status is also saved when a channel is stopped (using the STOP CHL command) and when the queue manager is ended. |
But saved status is still not displayed, then I restarted (stop and start) the channel again several times, again no saved status is displayed. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jul 18, 2007 11:54 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
1) I wouldn't use v5.3.7 for anything, especially if you've Java anywhere in the mix. It's way old.
2) The reason there's a RESOLVE CHANNEL command is the same reason the API allows you to set the message id manually. In 99.9% of cases you never use the facility and it's actually dangerous to do so. For more details, look through the forum; it's a popular topic and one I've ranted about frequently.
Likewise in 99.9% of cases you never use the resolve command. Like the Infocentre says, "In-doubt channel problems are usually resolved automatically". The key word is "usually". Under certain unusual conditions, the channel can't resolve the situation itself. For this situtation the resolve channel command has been provided so the MQ administrator, using facilities unavailable to the MCAs, can determine what to do to resolve the channel and instruct the system accordingly. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Nigelg |
Posted: Wed Jul 18, 2007 12:50 pm Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
Quote: |
Under certain unusual conditions |
An example of unusual conditions is where the indoubt channel was pointing to a qmgr or machine which no longer exists. In that case, the resolve is necessary either to back out the UoW so that the msgs can be redirected to another qmgr, or to commit the UoW and throw the msgs away. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
jcv |
Posted: Wed Jul 18, 2007 11:52 pm Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
Thanks.
Is it a typical example of unusual conditions where the resolve is necessary? The procedure of manual channel resynchronization is about comparison LUWIDs from saved status on both ends. Which obviously does not apply to this case ... |
|
Back to top |
|
 |
Vitor |
Posted: Thu Jul 19, 2007 12:06 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jcv wrote: |
Thanks.
Is it a typical example of unusual conditions where the resolve is necessary? The procedure of manual channel resynchronization is about comparison LUWIDs from saved status on both ends. Which obviously does not apply to this case ... |
Unusual conditions are by their nature unexpected.
And why are LUWIDs not applicable to your case? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jcv |
Posted: Thu Jul 19, 2007 12:18 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
So you say Nigel's one is not typical. I didn't say they are not applicable to my case, I said they are not applicable to Nigel's example. In which, in both cases, you perform action commit or backout without comparison. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Jul 19, 2007 12:32 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jcv wrote: |
So you say Nigel's one is not typical. I didn't say they are not applicable to my case, I said they are not applicable to Nigel's example. In which, in both cases, you perform action commit or backout without comparison. |
Nigel's example is a perfectly valid one, and you'd still (as he says) examine the UoW to decide the fate of the messages - backout or commit.
As you would in your case. Which would be better served by fixing the underlying problem. Either by investigating potential problems, or upgrading from CSD 7 to improved MCAs. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jcv |
Posted: Thu Jul 19, 2007 12:41 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
I guess he did say that. Never to resolve without examining LUWID. |
|
Back to top |
|
 |
jcv |
Posted: Thu Jul 19, 2007 4:07 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
I was confused by a fact that you can't compare anything to something which no longer exists, unless you don't save the record of it, before you remove it. And if you do it, this is not an unexpected condition at all. If you know you are going to remove a qmgr, it would be more reasonable to stop the channel to that qmgr, before that planned action, and not allow the channel to become in-doubt in that case, in the first place. The only situation in which I see the benefit, is when you still have functional remote qmgr which is due to an (unexpected) network failure unavailable to the local qmgr, you have in-doubt channel situation which you could not predict or prevent, and you cannot resolve it by fixing underlying network problem, and you have to continue communication over another queue manager on another machine, temporarily or permanently.
I think I was confused when Nigel said "which no longer exists", I think he only meant unavailable. Am I right? |
|
Back to top |
|
 |
Vitor |
Posted: Thu Jul 19, 2007 4:14 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jcv wrote: |
If you know you are going to remove a qmgr, it would be more reasonable to stop the channel to that qmgr, before that planned action, and not allow the channel to become in-doubt in that case, in the first place. |
The world is filled with unreasonable people. I can quote 2 real world examples where queue managers were removed without reference to the MQ administrator by people who, in a reasonable world, would have at least mentioned it ahead of time. And by removed I don't mean "unavailable". I mean "server rebuilt from ghost image" in one case and "hardware recycled" in another. I can also quote 6 near misses along the lines of "going to switch X to a new box in an hour and shut the old one down. That's not going to cause you a problem is it?". Given that X was bound to a queue manager not cliented it would have led to some serious outage & in-doubt!
Indeed on my current site, 2 queue managers have been deleted and replaced by 2 new (and differently named) ones. There's no record of this being done, no-one knows (or admits to knowing) who did it and that's caused all sorts of problems. Not least because whoever did it changed the mqm password and I can no longer administer the box. We have a group of suspects and the guilty will be found...
Also if you've found a way of prediciting disasters so that channels can be stopped and DR kit brought online as a planned action you're going to make a fortune.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jcv |
Posted: Thu Jul 19, 2007 4:37 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
I'm still confused. And how do you obtain, in that case, LSTLUWID from the receiving side, if you dropped the qmgr on the receiving side? You need it, to compare it to a CURLUWID on a sending (in-doubt) side. |
|
Back to top |
|
 |
|
|
 |
Goto page 1, 2, 3 Next |
Page 1 of 3 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|