ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Ungent: Design Issue of Message Failover Via MQ Clustering

Post new topic  Reply to topic
 Ungent: Design Issue of Message Failover Via MQ Clustering « View previous topic :: View next topic » 
Author Message
skytorch
PostPosted: Tue Jun 11, 2002 5:33 am    Post subject: Ungent: Design Issue of Message Failover Via MQ Clustering Reply with quote

Apprentice

Joined: 10 Jun 2002
Posts: 47
Location: New York City

Hi,

In the following layout

QM1 -> reader1 read from a queue
|
| failover
|
QM2 -> reader2 read from a queue after failover happens

In case of QM1 failure, I want to failover messages in QM1 queue to QM2 queue and start reader2 to keep processing messages.

1. What're the options to design this ?

2. I was thinking having a queue defined in QM1 and shared it in QM2. But reader2 can not read from the shared queue. How to resolve this ?

3. Why a shared queue is designed not readable at other QMs ?

Thanks in advance.

Sky
Back to top
View user's profile Send private message
AhHa
PostPosted: Tue Jun 11, 2002 6:02 am    Post subject: Reply with quote

Newbie

Joined: 10 Jun 2002
Posts: 3
Location: Asia Pacific

1) Use HACMP (AIX) or Microsoft Clustering or HP-UX equiv.
2) One can only read from a local instance of a queue. With failover the QM1 will be brought up on the other node and your processes should be able to coninue as before. Not sure what you definition of failover is here.
3) As for remote queues the same applies for shared queues in the cluster. One can only bind to a local instance of a queue for retrieving messages
Back to top
View user's profile Send private message
skytorch
PostPosted: Wed Jun 12, 2002 11:23 am    Post subject: Reply with quote

Apprentice

Joined: 10 Jun 2002
Posts: 47
Location: New York City

AhHa,

What I tried to address is a message replication issue.

I need to achive this design goal: If QM1 fails, QM2 will be used for the applications to read and write messages. All the messages in Q1 of QM1 will be replicated in Q2 of QM2. So *no* messages will be lost after failover from QM1 to QM2.

One option I'm thinking is: QM1 and QM2 are running in parallel. All the messages in Q1 of QM1 are replicated in real time to Q2 of QM2. All the operations such as delete a message from Q1 of QM1 will be reflected in Q2 of QM2 in order to make Q1 and Q2 in sync.

Is it something that can be done at MQ level or I've to write a application to achieve this ? Or is there other alternative to achive this ?

Thanks.

Sky
Back to top
View user's profile Send private message
bduncan
PostPosted: Wed Jun 12, 2002 12:07 pm    Post subject: Reply with quote

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

The option you are proposing is going to be hugely complex. First off, if you replicate each message to both queue managers, how do you determine which queue manager actually processes the message? You have 3 choices:
1) QM1 processes all messages, while QM2 simply removes them from the queue. When QM1 goes down, QM2 starts processing. This is terribly inefficient because QM2 is doing nothing while QM1 is running. Also, you would need to coordinate the processing of a given message on QM1 with the deletion of the same message on QM2 to ensure that no messages are thrown away on QM2 before being processed on QM1.

2) Both QM1 and QM2 process every message, which means all the work is being done twice. This is no different that option 1, except that you don't have to do any coordination between QM1 and QM2. However you are still inefficient because you are processing messages no faster than you would if you just had one machine.

3) QM1 processes all messages with an odd correlation id, and QM2 processes all messages with an even correlation id. This method of course requires that you control the correlation id such that the putting application alternates the last digit from even to odd. To illustrate, the sending application would put a message with correlid = 1 on QM1. He would put the same message with the same correlation id on QM2. The processing application on QM1 would see that the message has an odd correlation id, so it would process it. The same application on QM2 would see an odd correlation id and simply remove the message from the queue. When the next message comes along QM2 will process it, while QM1 ignores it, so on and so forth. Problems with this approach: again, if QM1 goes down, QM2 somehow needs to be informed about it before removing any messages with odd correlation ids. At the moment QM1 goes down, QM2 needs to start processing messages not only with an even correlid, but an odd correlid as well. Coordination to ensure that nothing gets lost basically means QM1 and QM2 will work in lock-step fashion, which means that this approach is no faster than option 1 or 2.

So what does this all mean? Well, it basically means you can't do any of this failover stuff unless you write some very creative code to take advantage of the COD functionality of MQSeries, or you take advantage of hardware failover products.
Imagine that QM1 and QM2 have a shared disk. Now QM1 and QM2 will both process messages, but when QM1 goes down, a new instance of QM1 is started on the same machine already running QM2, because all the information necessary to start QM1 is available on the shared disk. So while the machine that hosted QM1 is down, QM1 will be up and running again on the same machine as QM2. Your throughput will drop during these times (which will hopefully be rare!) and the rest of the time QM1 and QM2 will run in parallel.
_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator
Back to top
View user's profile Send private message Visit poster's website AIM Address
skytorch
PostPosted: Thu Jun 13, 2002 9:40 am    Post subject: Reply with quote

Apprentice

Joined: 10 Jun 2002
Posts: 47
Location: New York City

Brandon,

Thanks for the info. It's a very informative analysis.

My concern about using hardware replication is the standby server will be "cold" standby server instead of "hot" standby one. I.e. if a QM in one server box fails, it may take some time to start another QM in another box to pick up persistent messages from shared log and get ready for incoming messages.

Is it going to be a issue ? What performance/availability impact is it going to cause ?

Thanks.

Sky
Back to top
View user's profile Send private message
bduncan
PostPosted: Wed Jul 03, 2002 10:35 am    Post subject: Reply with quote

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

Well, if the standby is cold, that means there will be no queue manager responding at the IP address during the time it takes to boot up the cold standby.
All this means is that any messages destined for that queue manager will stack up on the transmission queues of any queue managers attempting to send messages to it. As long as the max depth of these transmission queues aren't exceeded (they can be set as high as 640,000) everything will automatically begin to flow once the channel can start (cold standby comes online). If any transmission queues fill up however, subsequent messages will go to the dead letter queue, in which case you'll need some application to re-process them (a dead-letter handler).
Since MQSeries is asyncronous in nature, the availability impact isn't an issue assuming that non of your transmission queues fill up, and none of your sending applications are expecting a reply message within a short time frame.
Performance wise, assuming your processing application can come online and keep up with the deluge of messages (all the messages on the various transmission queues will all flow immediately once the standby comes online), then performance shouldn't be an issue either.
How long do you imagine it would take to bring the cold standby online? I've worked in instances where we architected the system to deal with a queue manager being offline for hours or even days, so this shouldn't be an issue if we are only talking a few minutes...
_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator
Back to top
View user's profile Send private message Visit poster's website AIM Address
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » Clustering » Ungent: Design Issue of Message Failover Via MQ Clustering
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.