|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
(Resolved) Load balance anomaly |
« View previous topic :: View next topic » |
Author |
Message
|
bbburson |
Posted: Tue Nov 23, 2004 1:46 pm Post subject: (Resolved) Load balance anomaly |
|
|
Partisan
Joined: 06 Jan 2004 Posts: 378 Location: Nowhere near a queue manager
|
I'm new to MQ clustering, so please bear with me. I have searched this forum for an explanation of the behavior I'm seeing, but have not found this exact situation discussed.
The setup:
Four queue managers, two full repositories and two partials:
FR1 (MQver 530.7, Solaris 9)
FR2 (MQver 530.8, Solaris 9)
PR1 (MQver 530.7, HP 11.0)
PR2 (MQver 530.7, Solaris 9)
Cluster queue CL1 defined locally on FR1 and PR1
Cluster queue CL2 defined locally on FR2 and PR2
For each queue manager in turn I connect a client and put a message to each cluster queue; and I do that twice per queue (because the sample program amqsputc seems to be compiled with BIND_ON_OPEN option).
All the messages go where I expect them to with one glaring exception: every time I connect to FR1 and do puts to CL2 ALL the messages end up on FR2 (instead of half-and-half on FR2 and PR2). For some reason full repository FR1 favors the other full repository and never shoots any messages to the partial repository box.
The reverse situation, where I connect to FR2 and put to CL1, works as I would expect; half the messages end up on FR1 and half on PR1.
The cluster channels are all up and running. FR1 knows about both instances of CL2 (using runmqsc DIS QCLUSTER(CL2) command). If I disable puts to CL2 on FR2, the messages then will flow to PR2.
Any ideas why this is not behaving as expected?
Thanks, |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Nov 23, 2004 1:55 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Quote: |
For each queue manager in turn I connect a client and put a message to each cluster queue; and I do that twice per queue (because the sample program amqsputc seems to be compiled with BIND_ON_OPEN option).
|
Actually, I bet it is using the queue's default option, which by default is bind on open, so change your queues' attribute to not fixed.
As for why the messages all go to FR2, and not FR2 and PR2...well, for some reason, the algorithem on FR1 thinks the queue on FR2 is preferable. Maybe PR2 is suspended from the cluster. Note that this would allow messages to go to PR2 if they had no other choice, but they would not go to PR2 if there was another choice. Try switching all 4 queues to bind not fixed, and try your test again. Also, try connecting to PR1, and putting to Q2. In that case, is the mix 50/50? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Nigelg |
Posted: Wed Nov 24, 2004 12:34 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
It may be that a lot of msgs have already flowed from FR1 to PR2, and so the chl seq num on the channel is much higher than the seq num on the chl to FR2, so the algorithm chooses FR2 until such time as the seq nums match. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Nov 24, 2004 5:47 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Nigelg wrote: |
It may be that a lot of msgs have already flowed from FR1 to PR2, and so the chl seq num on the channel is much higher than the seq num on the chl to FR2, so the algorithm chooses FR2 until such time as the seq nums match. |
I respectfully disagree. Nigel, you never addressed my test results in this thread.
http://www.mqseries.net/phpBB2/viewtopic.php?t=17607&start=15
If clustering choose a path strictly based on the value of sequence numbers, I could send 1,000,000 message from FR1 to FR2 only, and then it could take days, months or years before another application sent enough messages from FR1 to PR2 to get its sequence number over 1,000,000, during which time no messages would round robin to FR1? We both know that it can't work like that.
My testing has shown that while the sequence number is *rising* faster than another one, it will count that against it as a chosen path. Details are in the thread mentioned above.
Again, I hate disagreeing with someone looking at the actual source code, but my testing shows something else, and logic also dictates that clustering should not work this way. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bbburson |
Posted: Wed Nov 24, 2004 7:05 am Post subject: further testing reveals... |
|
|
Partisan
Joined: 06 Jan 2004 Posts: 378 Location: Nowhere near a queue manager
|
Thanks for the replies! The DEFBIND setting on the queues certainly makes a difference.
Today I changed all four queues to DEFBIND(NOTFIXED) and the messages balanced between FR2 and PR2 as I had expected them to all along. That happens whether I do
Code: |
Connect
put
put
put
put
Disconnect |
-or-
Code: |
Connect
put
Disconnect |
four times in a row.
The mystery part is still why my original tests (using Connect/Put/Disconnect four times in a row) failed to load balance only when my client connected to FR1, doing puts to CL2. I used all the combinations of connection points (FR1, FR2, PR1, PR2) to each of the clustered queues (CL1, CL2), and only that one combo failed to load balance (of course connections to the qmgr where the queue is local put all the messages locally, as expected).
Oh, well, I'll make a note to always set my clustered queues to be DEFBIND(NOTFIXED) and maybe I won't have to worry about this particular thing again.
BTW, Peter, I've been following some of the discussions here and I agree with you that channel sequence number does not come into play in this scenario. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|