|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
Workload balancing not working as expected |
« View previous topic :: View next topic » |
Author |
Message
|
RatherBeGolfing |
Posted: Wed Sep 15, 2004 6:08 am Post subject: Workload balancing not working as expected |
|
|
 Centurion
Joined: 12 Nov 2002 Posts: 118 Location: Syracuse, NY, USA
|
I have been playing with this for a couple of days and can't crack the problem. I have 3 QMgrs. QM1 is z/os (5.3.1). QM2 is Win2000 (5.3+csd6) and QM3 is Win2000 (5.3+csd6).
I have a cluster Queue (QueueA) that is defined on both QM2 and QM3. Bind option set to Not Fixed.
I have a COBOL program that is putting say 10 messages to QueueA. I specify MQOO-BIND-NOT-FIXED as an open option. The program issues an MQConn once, then for each message to be put it issues an MQOpen, MQPut and MQClose. Following the last message to be put, I issue an MQDisc. I would have expected 5 messages to appear on QM1's instance of QueueA and 5 to appear on QM2's instance of QueueA.
What actually happens is that all 10 messages go to QM2. If I inhibit the put attribute of QueueA on QM2, all 10 messages go to QueueA on QM1.
I've searched threads in this site and I've read the appropriate sections of the manuals on MQOpen and Cluster workload management and can't see what I'm doing wrong.
Just for "fun" I even pulled the MQConn and MQDisc into my Open-Put-Close loop - didn't work. I haven't tried MQPut1 yet.....
Any advice is wildly appreciated! _________________ Cheers,
Larry
MQ Certifiable
Last edited by RatherBeGolfing on Wed Sep 15, 2004 10:30 am; edited 1 time in total |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Sep 15, 2004 6:44 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
What queue manager is your COBOL program connected to? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
RatherBeGolfing |
Posted: Wed Sep 15, 2004 6:49 am Post subject: |
|
|
 Centurion
Joined: 12 Nov 2002 Posts: 118 Location: Syracuse, NY, USA
|
The COBOL program is connecting to the z/OS queue manager (QM1). Also, it has been bound with the CSQBSTUB access module. _________________ Cheers,
Larry
MQ Certifiable |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Sep 15, 2004 6:52 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
You should see all messages going to the local queue on QMgr A, then. Local queues are always preferred - unless that's different in z/OS.
Unless you've installed an exit. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
RatherBeGolfing |
Posted: Wed Sep 15, 2004 6:57 am Post subject: |
|
|
 Centurion
Joined: 12 Nov 2002 Posts: 118 Location: Syracuse, NY, USA
|
QueueA does not exist on QM1, only on the Windows2000 queue managers, QM2 and QM3. And, we don't have any exits installed. _________________ Cheers,
Larry
MQ Certifiable |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Sep 15, 2004 7:45 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
RatherBeGolfing wrote: |
QueueA does not exist on QM1, only on the Windows2000 queue managers, QM2 and QM3. And, we don't have any exits installed. |
Then this:
RatherBeGolfing wrote: |
I have a cluster Queue (QueueA) that is defined on both QM1 and QM2. Bind option set to Not Fixed. |
was a typo.
Are you specifying any other OPEN options other than MQOO_OUTPUT and MQOO_BIND_NOT_FIXED?
Are you definately setting the Queue Manager name to blanks before each put? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
RatherBeGolfing |
Posted: Wed Sep 15, 2004 8:53 am Post subject: |
|
|
 Centurion
Joined: 12 Nov 2002 Posts: 118 Location: Syracuse, NY, USA
|
Jeff, yes that was a typo - should had said I have a cluster QueueA defined on QM2 and QM3 - sorry
I have no other open options besides MQOO-OUTPUT and MQOO_BIND_NOT_FIXED. And, I force spaces into the
MQOD-OBJECTQMGRNAME before each MQOpen statetment.
Thanks for hangin' in there with me! _________________ Cheers,
Larry
MQ Certifiable |
|
Back to top |
|
 |
Nigelg |
Posted: Wed Sep 15, 2004 11:19 pm Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
All other things being equal, i.e. cluster queue not PUT inhibited, qmgr not suspended, channels RUNNING, and a load of other stuff, the dest qmgr is determined by the msg sequence number on the CLUSSDR; the msg is sent to the qmgr with the lowest number. If a lot of msgs have already been sent to QM3 in the scenario above, then the WLB will put these msgs to QM2 until the msg sequence number of the chls to QM2 and QM3 is the same, and then alternate the msgs between them.
Try stopping and restarting the chls to QM2 and QM3 from QM1, to reset the msg seq nr, and then put msgs as before. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Sep 16, 2004 9:53 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I have never heard of the sequence # ever coming into play for the round robin algorithim, and my testing has shown that is not the case either. It cant be true, since squence #s are at the channel level and not atthe queue level. What if one app dumped a million messages to QM2 / QM3. And the App2 cam along and dumped 1000 messages to QM3 /QM4 (different queue). They wouldn't all go to QM4 just because the sequence number for QM3 was higher.
Try the test again, but this time, insure both channels to both QM2 and QM3 are running. If the channel to QM2 starts up before the one to QM3 does, MQ is fast enough to move all 10 messages to QM2 before QM3 ever gets a chance. And/or use more test messages in your test.
Below is the algorithim used (from the cluster manual):
Code: |
Algorithm: The steps in choosing a destination for a message:
If a queue name is specified, eliminate queues that are not PUT enabled.
Eliminate remote instances of queues that do not share a cluster with the local queue manager.
Eliminate remote CLUSRCVR channels that are not in the same cluster as the queue.
If a queue manager name is specified, eliminate queue manager alias' that are not PUT enabled.
Eliminate remote CLUSRCVR channels that are not in the same cluster as the local queue manager.
If the result above contains the local instance of the queue, choose it.
If the message is a cluster PCF message, eliminate any queue manager you have already sent a publication or subscription to.
If only remote instances of a queue remains, choose Resumed queue managers over Suspended ones.
If more than one remote instance of a queue remains, include all MQCHS_INACTIVE and MQCHS_RUNNING channels.
If less than one remote instance of a queue remains, include all MQCHS_BINDING, MQCHS_INITIALIZING, MQCHS_STARTING, and MQCHS_STOPPING channels.
If less than one remote instance of a queue remains, include all MQCHS_RETRYING channels.
If less than one remote instance of a queue remains, include all MQCHS_REQUESTING, MQCHS_PAUSED and MQCHS_STOPPED channels.
If more than one remote instance of a queue remains and the message is a cluster PCF message, choose locally defined CLUSSDR channels.
If more than one remote instance of a queue remains to any queue manager, choose channels with the highest NETPRTY to each queue manager.
If more than one remote instance of a queue remains, choose the least recently used channel.
|
_________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Nigelg |
Posted: Fri Sep 17, 2004 12:10 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
Quote: |
I have never heard of the sequence # ever coming into play for the round robin algorithim, and my testing has shown that is not the case either. It cant be true, since squence #s are at the channel level and not atthe queue level. What if one app dumped a million messages to QM2 / QM3. And the App2 cam along and dumped 1000 messages to QM3 /QM4 (different queue). They wouldn't all go to QM4 just because the sequence number for QM3 was higher.
|
The default round robin algorithm IS at the channel level, not the queue level. In your example, that is just what does happen.
Read your own quote...
Quote: |
If more than one remote instance of a queue remains, choose the least recently used channel.
|
It chooses the least recently used channel by comparing the sequence numbers. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Fri Sep 17, 2004 4:56 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
So, the big queustion is, does
Quote: |
If more than one remote instance of a queue remains, choose the least recently used channel.
|
mean:
that any message for any queue counts, or is the algorithim smart enough to only consider the last channel used for a message that is part of this particular round robin.
For example, AppA puts 1,000,000 messages as fast as it can to QueueA, which is hosted on QM2 and QM3. They round robin equally for the sake of discussion. Then AppB starts putting messages to QueueB which is hosted on QM3 and QM4, but it puts those messages once a second. Are you saying that all of AppB's messages will go to QM4, because App1 is using the channel to QM3 so heavily? That doesn't sound right to me.
If this were the case 1 big app would be disrupting the round robining for every other app in a cluster, and I just don't think IBM would design it that way (but I could be wrong).
When you read
Quote: |
If more than one remote instance of a queue remains, choose the least recently used channel.
|
you are assumming
Quote: |
by comparing the sequence numbers
|
I assumed it meant by comparing which channel AppB last sent a message down, not which channel any app, including the Cluster Repositories, last sent a message.
Do you have access to the algoritim code? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
hguapluas |
Posted: Mon Sep 20, 2004 7:08 am Post subject: |
|
|
Centurion
Joined: 05 Aug 2004 Posts: 105 Location: San Diego
|
Per the Clustering book;
"...the algorithm determines which destinations are suitable. Suitability is based on the state of the channel (including any priority you might have assigned to the channel), and also the availability of the queue manager and queue. The algorithm uses a round-robin approach to finalize its choice between the suitable queue managers."
Make sure you defined both QueueA's exactly the same on both 2000 boxes. Any differences, no matter how slight, will alter the round-robin and produce varied results.
The round-robin algorithm works as I have tested this on cluster queues and seen it in action. Also make sure you are not specifying a specific QM when sending the message. Just send it to the Queue "QueueA" and the cluster will do the rest. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Sep 20, 2004 11:04 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
hguapluas wrote: |
Make sure you defined both QueueA's exactly the same on both 2000 boxes. Any differences, no matter how slight, will alter the round-robin and produce varied results.
|
Given 2 queues in a cluster with the same name, the ONLY queue attribute that the algorithim considers (to the best of my knowledge) is whether the queue is PUT_INHIBITED or not. I have never seen any documentation that says the algorithim considers any other queue attribute when it decides where to send messages. If the queues have any other differences (Description, Max Depth, Persistence, etc), it should not matter. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
hguapluas |
Posted: Tue Sep 21, 2004 6:38 am Post subject: |
|
|
Centurion
Joined: 05 Aug 2004 Posts: 105 Location: San Diego
|
You're right. That applies more to using the MQINQ call rather than the workload balancing algorithm. |
|
Back to top |
|
 |
mdurman |
Posted: Wed Sep 22, 2004 4:54 pm Post subject: |
|
|
 Newbie
Joined: 28 Jun 2001 Posts: 3 Location: Whittier, California
|
Many people believe that cluster workload balancing is on a per-queue basis, but that's not true. It is on a per-channel basis. We've just been through this recently ourselves.
The final check workload balancing makes is which channel, of the channels that can be chosen, has the oldest used date/time, and chooses that one.
So (and this was our scenario), let's assume you have 3 Queue Managers, QM1, QM2 and QM3. You have two clustered queues called QA and QB, defined on QM2 and QM3. You have an app that connects to QM1, opens the clustered queues using BIND_NOT_FIXED, and puts a large (let's say 100 KB) app message to QA followed by a small (let's say 500 byte) log message to QB. It does this 1,000 times.
Assuming there is no other activity on the Queue Managers, all the large messages will end up on one Queue Manager and all the small messages will end up on the other.
What happens is this. The app puts the app message to QA. Workload balancing selects a Queue Manager (let's say QM2). The message is sent there. The app then puts the log message to QB. Workload balancing checks the channel status, and selects QM3, because that has the oldest channel date/time, and the message is sent there.
The app then sends the next app message to QA. Workload balancing checks the channel status and selects QM2, because that has the oldest date/time now, and sends the message there. The app then puts the log message to QB, and QM3 is again selected on date/time.
That ping/pong between channels continues until all 1,000 messages have been put.
The end result appears to be that workload balancing is not functioning, or that you have queues opened in BIND_ON_OPEN mode, but it really is functioning as documented. |
|
Back to top |
|
 |
|
|
 |
Goto page 1, 2, 3 Next |
Page 1 of 3 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|