Author |
Message
|
jnoubi |
Posted: Fri Nov 02, 2007 3:37 am Post subject: Error handling in MB/MQ |
|
|
Novice
Joined: 26 Sep 2006 Posts: 24
|
I am working on an error handling strategy for our environment which uses MQ 6.0 for transport and MB 6.0 for routing and transformation.
One of the strategies is to roll back messages into the source system in case the link with the target system is down.
So basically, in case the link between the MB and the target system is down (for any reason such as channel down, network issues,..), my strategy is to roll back any messages to the source system's queue (no DLQ is specified for the queue manager of the message broker and retry count is set to 5 for example). This works well because I am ensuring that incoming messages will stay in sequence and the source system will be forced to stop sending once their queue is full.
However, the problem that I am facing is how to restart the process of MB picking up the messages from the incoming queue once the link to the target system is restored?
So basically, the recovery strategy is not clear. If I manually remove the first message from the incoming queue (for ex using rfhutil), the flow is restarted. However, because one of my requirements is not to lose any message and not to lose the order of the messages, I can't manually remove messages.
Any suggestions to restart the flow without any loss of messages or order? |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Nov 02, 2007 4:11 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
You should restart the flow in the opposite manner to how you stopped it.
But most everything about your design is not best practices. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
JosephGramig |
Posted: Fri Nov 02, 2007 4:15 am Post subject: |
|
|
 Grand Master
Joined: 09 Feb 2006 Posts: 1244 Location: Gold Coast of Florida, USA
|
Good question, but you put it in the wrong topic. Someone will move this to the MQSI/WMB Forum.
Hmmm, I assume you want the flow to automatically stop and start when connectivity is restored.
Wouldn't it be better to create the XMIT size to hold all of the messages until the connectivity is restored and let the channel handle this issue? _________________ Joseph
Administrator - IBM WebSphere MQ (WMQ) V6.0, IBM WebSphere Message Broker (WMB) V6.1 & V6.0
Solution Designer - WMQ V6.0
Solution Developer - WMB V6.1 & V6.0, WMQ V5.3 |
|
Back to top |
|
 |
jnoubi |
Posted: Fri Nov 02, 2007 6:27 am Post subject: |
|
|
Novice
Joined: 26 Sep 2006 Posts: 24
|
Thanks for the replies so far.
Restarting the flow did not help. The queue is still taking in messages but not releasing any to MB. The only I was able to get the queue to start functioniong properly is by removing the first message (that was retried few times).
Regarding the increase of the XMIT queue size: I assume you are referring to the xmit queue on the message broker tier: that still does not solve the problem of restarting the reception of messages from the source application queue, it just buys more time until the queue starts overflowing. |
|
Back to top |
|
 |
bower5932 |
Posted: Fri Nov 02, 2007 6:28 am Post subject: |
|
|
 Jedi Knight
Joined: 27 Aug 2001 Posts: 3023 Location: Dallas, TX, USA
|
jnoubi wrote: |
TRestarting the flow did not help. The queue is still taking in messages but not releasing any to MB. The only I was able to get the queue to start functioniong properly is by removing the first message (that was retried few times). |
What is the backout count on the first message? Is it possible that the flow was actually receiving it and kept rolling it back? |
|
Back to top |
|
 |
JosephGramig |
Posted: Fri Nov 02, 2007 7:20 am Post subject: |
|
|
 Grand Master
Joined: 09 Feb 2006 Posts: 1244 Location: Gold Coast of Florida, USA
|
Well, the channel is designed to get the messages from point A to point B and knows how to recover and restart in the event of a communications failure.
So, if you size the XMITQ to hold as many messages as will be generated during the outage, then the flow will not need to stop and therefore not need to be restarted. _________________ Joseph
Administrator - IBM WebSphere MQ (WMQ) V6.0, IBM WebSphere Message Broker (WMB) V6.1 & V6.0
Solution Designer - WMQ V6.0
Solution Developer - WMB V6.1 & V6.0, WMQ V5.3 |
|
Back to top |
|
 |
jnoubi |
Posted: Fri Nov 02, 2007 7:46 am Post subject: RE: |
|
|
Novice
Joined: 26 Sep 2006 Posts: 24
|
To bower5932: The backout count was 147 . So apparently it retried few times and then stopped?
To JosephGramig: I think you mis-understood the case I am trying to resolve, so let me explain it in a different way:
The source App A has put a bunch of messages on a remote queue that points to the message broker queue manager. The message broker has a local queue LQA that receives messages from App A. The message broker puts these messages on remote queue RQB to send them to Application B. But when the link with application B is down, the message can not put anymore messages and will start rolling back messages to LQA which in turn starts accumulating messages until it reaches its max allowable depth. Application A will not be able to send anymore messages because LQA is full. The first message on the LQA queue will have a certain backout count. Application A will abend. So far so good.
However, the problem is how to restart the processing of messages once the link with Application B is restored?
It does not restart automatically. |
|
Back to top |
|
 |
bower5932 |
Posted: Fri Nov 02, 2007 7:57 am Post subject: Re: RE: |
|
|
 Jedi Knight
Joined: 27 Aug 2001 Posts: 3023 Location: Dallas, TX, USA
|
jnoubi wrote: |
To bower5932: The backout count was 147 . So apparently it retried few times and then stopped? |
I'm not sure that it stopped. It might still be reading it and rolling it back. You need to check again to see if the count is increasing. If it is, your problem is what is wrong with the message and not why is the broker not picking up messages. Have you looked at any of the error logs? Have you turned on tracing? |
|
Back to top |
|
 |
SAFraser |
Posted: Fri Nov 02, 2007 11:03 am Post subject: |
|
|
 Shaman
Joined: 22 Oct 2003 Posts: 742 Location: Austin, Texas, USA
|
Please let me be sure I understand.
Broker puts message to RQB, destined for a queue manager that services application B.
The link to the application B queue manager goes down, so the sender channel goes down. The transmit queue that services the channel to application B gets full.
The broker cannot put any more messages to the RQB because the transmit queue is full. So it rolls messages back to LQA.
Is this correct? If yes, then I agree with all others.... Resize the transmit queue that services the channel to application B. The broker's service will not be interrupted.
One of the strong points of MQ is its ability to decouple the sender and the receiver applications. A way to do this is to size your transmit queues to allow for channel downtime. Some of my transmit queues are really quite large, for this very reason.
Am I understanding your situation correctly?
Thanks,
Shirley |
|
Back to top |
|
 |
jnoubi |
Posted: Fri Nov 02, 2007 12:13 pm Post subject: RE: |
|
|
Novice
Joined: 26 Sep 2006 Posts: 24
|
SAFraser: Part of the problem is to handle the downtime which by increasing the queue depth as you described. That is fine.
But what I am trying to document for our support team is the recovery strategy -> what are the steps that are needed in order to get back to normal operation: what I mean by this is when the connection to App B is up, what do I need to do to get the flow in MB to restart picking up messages from LQA? it does not do this automatically when the connection to B is restored. The queue is still blocked even though the problem with App B is resolved. I found that the only to resolve the problem is to manually remove the first message from LQA (the one that has a high backout retry count). |
|
Back to top |
|
 |
Michael Dag |
Posted: Fri Nov 02, 2007 12:37 pm Post subject: Re: RE: |
|
|
 Jedi Knight
Joined: 13 Jun 2002 Posts: 2607 Location: The Netherlands (Amsterdam)
|
jnoubi wrote: |
SAFraser: Part of the problem is to handle the downtime which by increasing the queue depth as you described. That is fine.
But what I am trying to document for our support team is the recovery strategy -> what are the steps that are needed in order to get back to normal operation: what I mean by this is when the connection to App B is up, what do I need to do to get the flow in MB to restart picking up messages from LQA? it does not do this automatically when the connection to B is restored. The queue is still blocked even though the problem with App B is resolved. I found that the only to resolve the problem is to manually remove the first message from LQA (the one that has a high backout retry count). |
If your queuedepth is deep enough... you'll never get to this point and no recovery of 'broker' flows is needed...  _________________ Michael
MQSystems Facebook page |
|
Back to top |
|
 |
JosephGramig |
Posted: Fri Nov 02, 2007 12:46 pm Post subject: |
|
|
 Grand Master
Joined: 09 Feb 2006 Posts: 1244 Location: Gold Coast of Florida, USA
|
So what we are saying is, don't get into the XMITQ full condition and you don't have to restart the flow because it won't stop.
The broker will not process the message because it has exceeded its retry count and you have no DLQ for it to put the message to. The flow is clearly in transactional mode.
Of course, I highly recommend you search this site about DLQs and read the information. _________________ Joseph
Administrator - IBM WebSphere MQ (WMQ) V6.0, IBM WebSphere Message Broker (WMB) V6.1 & V6.0
Solution Designer - WMQ V6.0
Solution Developer - WMB V6.1 & V6.0, WMQ V5.3 |
|
Back to top |
|
 |
jnoubi |
Posted: Fri Nov 02, 2007 1:18 pm Post subject: RE: |
|
|
Novice
Joined: 26 Sep 2006 Posts: 24
|
Thanks for the replies.
Regarding the DLQ, I purposefully removed the DLQ because I want to ensure the sequence of my messages. If one of the messages end up in the DLQ, and the message broker processes the next one then I lost the sequence of messages.
I guess the best solution is to try to avoid this situation by having very deep queues. |
|
Back to top |
|
 |
bower5932 |
Posted: Fri Nov 02, 2007 2:13 pm Post subject: Re: RE: |
|
|
 Jedi Knight
Joined: 27 Aug 2001 Posts: 3023 Location: Dallas, TX, USA
|
jnoubi wrote: |
I guess the best solution is to try to avoid this situation by having very deep queues. |
The best solution would be to avoid having to have your messages delivered in sequence. Is it possible to send your messages in groups and then only get them when they are all there? |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Nov 02, 2007 3:21 pm Post subject: Re: RE: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
jnoubi wrote: |
Thanks for the replies.
Regarding the DLQ, I purposefully removed the DLQ because I want to ensure the sequence of my messages. If one of the messages end up in the DLQ, and the message broker processes the next one then I lost the sequence of messages.
I guess the best solution is to try to avoid this situation by having very deep queues. |
You have designed for message affinity... Everybody will tell you this is a bad thing and kills scalability...
Back to the drawing board !  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|