Author |
Message
|
tminifie |
Posted: Tue Feb 06, 2007 7:08 am Post subject: MQ 6.0 cluster workload balancing |
|
|
Apprentice
Joined: 18 Jan 2007 Posts: 26
|
I have a few questions about cluster work load balancing with MQ 6.0. I will list our application specifics first then pose the questions:
MQ 5.3 – soon to migrate to MQ 6.0
AIX 5.2
IBM XL C/C++ Compiler Application sending messages via MQI.
Our MQ cluster consists of a gateway Qmgr that distributes messages to 3 other Qmgr’s. All Qmgr’s with the exception of the gateway Qmgr have a local queue(CI). All 4 Qmgr’s reside on different servers. The application puts messages on the gateway Qmgr which in turn moves the messages in a round robin fashion to the other 3 Qmgr’s with a local CI queue. In some cases the message will spawn a new process on 1 of the 3 servers that hosts a CI queue. The server in which the new process starts on ends up processing all of the messages from the new process. However, to balance the workload from the new process a cluster workload user exit was created and is used by all 3 of the Qmgr’s that have a local CI queue.
Questions:
1. By setting the new CLWLUSEQ queue and queue manager attribute to “ANY” we can eliminate our cluster workload user exit and still get the desired round robin behavior?
2. Today we get a very even distribution of messages to all of the 3 CI queues. The problem is we occasionally have one server that lags behind the others in processing messages. I would like to distribute fewer messages to the server that’s lagging behind and send more messages to the speedy servers. I have read about all of the new workload balancing attributes shipped with MQ 6.0 and I don’t believe any of them can assist in fixing this problem because not always the same server is lagging behind. Basically I would like to check the queue depth of all 3 CI queue instances and route messages to the queue which is least full. How can I accomplish this through MQ attributes/exits without changing our application?
Below is a snapshot of the “CI” queue from all 3 servers during running of a batch job. You can see that server #3 lagged behind on this particular evening. Our application trace mechanism confirmed for me that all 3 servers did process the same amount of messages over this 18 min. time frame.
Server
#1 Time Qdepth #2 Time Qdepth #3 Time Qdepth
2:05:00 43 2:05:00 41 2:05:00 39
2:06:00 43 2:06:00 16 2:06:00 127
2:07:00 43 2:07:00 42 2:07:00 347
2:08:00 76 2:08:00 20 2:08:00 589
2:09:00 69 2:09:00 30 2:09:00 827
2:10:00 40 2:10:00 1 2:10:00 1059
2:11:00 56 2:11:00 46 2:11:00 1439
2:12:00 71 2:12:00 16 2:12:00 1864
2:13:00 41 2:13:01 15 2:13:00 2718
2:14:00 63 2:14:00 58 2:14:00 3934
2:15:00 0 2:15:00 0 2:15:01 3051
2:16:00 0 2:16:00 1 2:16:00 1784
2:17:00 0 2:17:00 6 2:17:00 1121
2:18:01 0 2:18:00 0 2:18:00 880
2:19:00 2 2:19:00 1 2:19:00 676
2:20:00 1 2:20:00 1 2:20:00 437
2:21:00 1 2:21:00 0 2:21:01 222
2:22:00 0 2:22:00 0 2:22:00 31
2:23:00 0 2:23:00 0 2:23:00 0
Thanks in advance.
Todd |
|
Back to top |
|
 |
HMed |
Posted: Tue Oct 09, 2007 4:17 am Post subject: |
|
|
Novice
Joined: 17 Sep 2004 Posts: 17 Location: Camp Hill, PA. - USA
|
Quote: |
I would like to distribute fewer messages to the server that’s lagging behind and send more messages to the speedy servers. |
Does anybody know if there is a way to do this other than a user exit? I've been through the manuals and have gone over the algorithm but haven't found anything that addresses the situation of workload balancing based on queue depth.
Thanks. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Oct 09, 2007 4:27 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
You have to use an exit. There's no other way to stream receiver qdepth information back across the channel to the sending side. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
HMed |
Posted: Tue Oct 09, 2007 4:42 am Post subject: |
|
|
Novice
Joined: 17 Sep 2004 Posts: 17 Location: Camp Hill, PA. - USA
|
Thanks for the info. I found some other posts from Peter and it seems that an exit which checks queue depth before every message is put would be quite expensive. |
|
Back to top |
|
 |
Nigelg |
Posted: Tue Oct 09, 2007 12:50 pm Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
It is not very clear whether from the above if it is always the same server which processes the msgs slower. If it is, then the CLWLWGHT channel attribute looks like it fits the bill
Quote: |
CLWLWGHT channel attribute
To apply a weight to a channel for workload management purposes use the CLWLWGHT attribute, so that the proportion of messages sent down the channel can be controlled. The value must be in the range 1 through 99 where 1 is the lowest rank and 99 is the highest. This parameter is valid only for channels with a channel type (CHLTYPE) of CLUSSDR or CLUSRCVR.
Use this attribute to ensure that machines with more processing power are sent more messages. The higher the channel weight, the more messages are sent over that channel.
|
However, don't you think it would be better to find out why the server is slower, and fix that, rather than tweaking the cluster attributes? _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
HMed |
Posted: Wed Oct 10, 2007 2:48 am Post subject: |
|
|
Novice
Joined: 17 Sep 2004 Posts: 17 Location: Camp Hill, PA. - USA
|
That is certainly a good point for the server being slower, but in my case I actually want to create a failover situation for my WAS MDB using MQ.
We have a 2 MDBs running on two different WAS servers. They each listen on a different remote queue within the same cluster. If there is ever a problem with one of the MDBs or one of the WAS servers, MQ doesn't know about it and is still putting messages in that particular REQ queue. I simply wanted to have MQ workload balance based on queue depth, ie.. if one of the queues had messages in it and the other did not, it stands to reason that there is a problem with the WAS server or the MDB on that server.
With WAS 6.0 we are now able to use Service Integration Buses, which may be able to help me solve the failover problem, coupling MQ more loosely with the listener. I'm going to look into that instead of MQ workload balancing.
 |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Oct 10, 2007 3:13 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
You could always trigger the queue on depth, and run a script to put-disable it or unshare it from the cluster. That would cause the cluster to stop sending messages to it.
No exits, no SIBus, nothing too fancy. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Oct 10, 2007 4:37 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
jefflowrey wrote: |
You could always trigger the queue on depth, and run a script to put-disable it or unshare it from the cluster.... |
The queue not being open for input is one of the prerequisites for the trigger to fire for triggering types of First and Depth. If the queue starts backing up its quite possible that it may in fact be open by the app, just that the app is not pulling the messages fast enough.
Instead, use your MQ monitoring tool to watch the q depth. If the q crosses your high water mark you can have it fire off a script that will do what Jeff suggested. It can also email you letting you what happened.
HOWEVER, you run the risk of both sides backing up at the same time for whatever reason. Then your solution will put disable both queues and the cluster won't be able to deliver any of these messages. They will start piling up in the System Cluster Transmit Queue. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Oct 10, 2007 4:39 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
PeterPotkay wrote: |
The queue not being open for input is one of the prerequisites for the trigger to fire for triggering types of First and Depth. |
D'OH. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
jcv |
Posted: Mon Oct 15, 2007 4:25 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
HMed wrote: |
With WAS 6.0 we are now able to use Service Integration Buses, which may be able to help me solve the failover problem, coupling MQ more loosely with the listener. I'm going to look into that instead of MQ workload balancing. |
Is this "more loosely coupling MQ with the listener" something like smooth creating additional MDB instances which listen on remote queue instances of the same queue (additional to that, which listens to a local instance of a queue)? Something like if you have 2 qmgrs hosting same queues for the same applications, deployed on 2 WAS, 2 MDB's on each WAS would be needed for the failover? |
|
Back to top |
|
 |
jcv |
Posted: Mon Oct 15, 2007 4:37 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
Or is it like you can configure one MDB which can listen to "all" instances of the same queue in the cluster? Thank you. |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Oct 15, 2007 8:02 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
jcv wrote: |
Or is it like you can configure one MDB which can listen to "all" instances of the same queue in the cluster? Thank you. |
You can configure multiple instances of the same MDB.
Each instance can only listen to a single queue.
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jcv |
Posted: Tue Oct 16, 2007 3:24 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
Yes, thank you for your response. Although I know this is a well known topic, qmgr cluster failover issue, let me summarize it to see if I understand the problem or not.
You may find convenient to choose "queue not open for input", "queue depth high", or whatever combination of conditions for your monitoring tool to run your "put disable + cluster unshare" script on that queue. In that case, inverse condition "queue open for input" and/or "queue depth low" should run inverse "put enable + cluster share" script. That would be "MQ admin" variations in approach to the problem, which may cause unprocessed messages which were put before first script is executed and after queue stopped to be serviced by consuming application.
There is a "WAS admin" approach, by configuring on each WAS for every such queue hosted on two qmgrs, one MDB connecting in binding mode to service local queue instance, and another connecting in client mode to service its pair on remote qmgr. That is 4 MDBs per queue all together for 2 qmgr (WAS) cluster. Is this approach appropriate?
Considering the possible number of queues which might be needed to be handled this way, both approaches are not smooth and easy way to handle the situation.
That's why people ask if there is a simple generic way to set for all queues cluster workload balance algorithm based on queue depths? Or is there a simple generic way to set for all MDBs previously described doubled configuration? Or is there a third way which would be simple and generic and which would be more appropriate or usual to handle this topic?
Any comments or links to appropriate documents describing the issue more in depth will be highly appreciated. |
|
Back to top |
|
 |
jcv |
Posted: Tue Oct 16, 2007 3:45 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
By the way, I'm not native english speaker. Which is right:
"queue open for input" or "queue opened for input" ?
I want to use passive form for open. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Oct 16, 2007 3:49 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Just because I'm a native speaker of English doesn't mean I'm any good at it... also, most of the rest of the English-speaking world would like to claim that I speak "American" and not "English" - apparently with an accent that is somewhere between Midwestern and Southern. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
|