Author |
Message
|
Mehrdad |
Posted: Sun Feb 03, 2008 12:45 pm Post subject: Challenge Question - 02 / 2008 |
|
|
Master
Joined: 27 Feb 2004 Posts: 219 Location: Europe
|
Here comes our February 2008 Challenge Question
"There is a cluster called SALES, with 9 queue managers in it, all running MQ 6.0.1. Queue Managers A, B & C sevice the inbound requests, queue managers 1 - 6 process the requests and transmit replies. Applications connect by client to the A - C managers and create a temporary dynamic queue to receive their particular reply.
The applications are grouped into geographic groups, and use 3 different client tables to make a connection by default to their nearest queue manager but any application instance can use any queue manager A - C. Typically each of these queue managers will have 75-100 clients connected at any given time.
Queue Managers 1 - 6 are on Solaris machines, the box hosting queue manager 6 is significantly more powerful than the others and the CLWLWGHT parameter on this box has been set to 75 to reflect this. All the other CLWLWGHT on 1-5 are set at 25. All the machines 1-6 host the same set of queues. The FRs in the cluster are queue manager A and queue manager 6.
You (as MQ admin) receive a call from the Solaris admins complaining that boxes 1 -5 are heavily loaded and running slowly, and though box 6 is processing messages it's not considered to be pulling it's weight in terms of message volume and has spare capacity.
What action(s) do you take?"
Answers are encouraged to be posted here, yet for the one(s) who would like to remain discrete some you can send your answer to challengefeb2008@cressida.info . |
|
Back to top |
|
 |
Vitor |
Posted: Mon Feb 04, 2008 2:33 am Post subject: Re: Challenge Question - 02 / 2008 |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Mehrdad wrote: |
You (as MQ admin) receive a call from the Solaris admins complaining that boxes 1 -5 are heavily loaded and running slowly, and though box 6 is processing messages it's not considered to be pulling it's weight in terms of message volume and has spare capacity.
What action(s) do you take?"
|
I'll throw this one in before someone else does:
- tell the Unix admins not to be such cry-babies
- remind them that extra resource is cheap
- point out that if they can't get the budget to buy it that's not your problem
- make yourself a coffee
Optionally you could point out that if they replaced machines 1-6 with a proper computer (i.e. a mainframe) they'd have bags of power. If that hasn't got rid of them, start telling stories of the time the 80 column card punch broke and you had to program with a pair of scissors.....
 _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
AkankshA |
Posted: Mon Feb 04, 2008 3:17 am Post subject: My 1 cent... |
|
|
 Grand Master
Joined: 12 Jan 2006 Posts: 1494 Location: Singapore
|
did i hit the nail or am i shooting on the other wall itself
QM 6 is an FR with CLWLWGHT as 75 & QM 1-5 are PR with CLWLWGHT as 25
When 3 messages are sent on QM6 channel only only 1 would be sent to rest of channels.
QM6 A full repository ll push its information via cluster-sender channel to QM A another full repository's cluster-receiver channel.
Hence the load QM A and QM 6 would already be high (being full repositories) with regular update msgs(which does not eat up much of the unix box capacity).
Increase the CLWLWGHT ratio ...!! _________________ Cheers |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Feb 10, 2008 6:06 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Looks to me like the priority was set at the channel level.
This has some disadvantages: If you are aiming for messages hitting a specific queue/ process you may not get the desired result as it will take into consideration all messages transiting the channel.
This is why I would suggest to set the CLWLPRTY attribute of the destination queue in the cluster. Set it to 7 on the qmgr you want to favor and to 3 on all others.
I have found this to be particularly effective if you get messages delivered in batches. At some point the (main) target queue becomes unavailble (mca agent putting to the queue) and the other destinations in the cluster then act as a failover. Polling ensures that the main receives messages as soon as it is available again... Thus the mechanism acts more as a load balance then as a failover... (failover mode for me was using 9 and 0 values).
Looking at how the problem was presented the first thing I would need to check is the health of the cluster channel towards QM6 from QMA,QMB, and QMC. If it is shown as running, I would stop it and restart it, enabling a reset of the statistics so that the balance gets restored...
Making a different qmgr an FR and QM6 a PR could also improve on the message distribution.
Ultimately I would put the balancing at Q level for the important heavy duty processing messages. _________________ MQ & Broker admin |
|
Back to top |
|
 |
Gomez Addams |
Posted: Mon Feb 11, 2008 2:26 pm Post subject: |
|
|
 Newbie
Joined: 11 Feb 2008 Posts: 1 Location: Minneapolis
|
Assuming that the workload should be distributed over the six servers as evenly as possible given the MQ parameters available and without writing code, I think that further refinement of CLWLWGHT is in order. It is impossible to tell what the refinement should be without knowing more about the size of the processing disparity and what would be considered a properly balanced load in this circumstance, given an unbalanced processing environment (i.e., one processor is significantly faster than the other five).
Assuming that work should be sent to the fastest processer first until it ceases to be fastest and only then distributed among the other servers, use CLWLPRTY to give the queues on the QM6 higher priority than on the other QMs. Of course, the system running QM6 will be useless for anything else.
Assuming that anything goes and any amount of resources must be expended to produce the perfect balance and coordination of work, a cluster workload exit can be written to favor QM6 and use any number of indicators not normally available to MQ to determine if QM6 is getting busy and traffic should be directed to other QMs. _________________ Run this program on your system. Experiment with leaving out parts of the program, to see what error messages you get. -- Kernighan and Ritchie |
|
Back to top |
|
 |
jefflowrey |
Posted: Mon Feb 11, 2008 2:43 pm Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
I think these challenges are written to be as complete in themselves as possible - i.e. that there's no "missing" information needed to figure out what's going on.
There is usually missing information about the problem itself, of course.
This one is particularly interesting, because if I'm right in my own guess, then one has to understand how clusters work to know why the behavior is being shown.
Of course, my own guess for last month's challenge was entirely wrong - I was thinking something else entirely. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Feb 11, 2008 3:33 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Assuming that:
*There are no network problems going to QM6 that would be effecting channel availability
*All the queues on QM1 to QM6 are defined exactly the same AND all enabled for MQPUTs
* You don't have any apps specifying QM1-QM5 but not QM6 on the front end MQOPENs / MQPUT1s
I'm inclined to think its related to the fact that QM6 is a FR and so the channels to it are going to have more traffic due to the cluster admin messages.
http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.csqzah.doc/qc10940_.htm
Quote: |
Note that the distribution of user messages is not always exact, because administration and maintenance of the cluster causes messages to flow across channels. This can result in an apparent uneven distribution of user messages which can take some time to stabilize. Because of this, no reliance should be made on the exact distribution of messages during workload balancing.
|
Working as designed!
I know, I know, there must be something else. Cluster admin messages couldn't disrupt the balance that much and a weight of 75 on the channel should mean a lot more app messages should be going to it. If the above assumptions are true I can't imagine what the problem is. Send the Problem Ticket to Network Support!  _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
starship |
Posted: Tue Feb 19, 2008 2:25 am Post subject: |
|
|
Apprentice
Joined: 07 Dec 2005 Posts: 33 Location: INDIA
|
Hello All,
I think that the Queue Managers B and C are not able to connect to Queue Manager 6 and all the Request from B and C are going to QM 1-5, Whereas the request from A is going to all the QM 1-6.
The reason why QM B and C are not being able to connect with QM 6 might be due to Network Issues b/w QM B,C and QM 6.
Please advise.
Regards |
|
Back to top |
|
 |
Vitor |
Posted: Fri Feb 29, 2008 12:58 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Thanks to all for participating in this month's competition!
IMHO it's generated a lot of interesting and useful discussion on cluster workload balancing and thrown up some good points.
The root cause of the scenario (for the record) was a new version of the client code which connects A-C, being rolled out across the organisation for non-MQ related reasons. This code had been changed to specify BIND_ON_OPEN, and hence for the duration of it's activity used a specific queue on a specific queue manager. This (obviously) eliminates workload distribution and skews the load.
But the question was: What action(s) do you take?. Now many of the actions suggested round network connection and changing CLWLWGHT could have helped, if only by not having the expected action and pushing the investigation elsewhere. But there can be only one winner (apparently, unless there's another special prize) and thus PeterPotkay wins by a nose as the only solution which explictly mentioned checking the application code.
Well done to you sir, please PM Mehrdad with details of where you want your glorious, desirable and quite frankly stunning prize sent.
Thank you all again for your contributions. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Feb 29, 2008 5:45 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
jefflowrey wrote: |
my own guess was entirely wrong |
 _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Vitor |
Posted: Fri Feb 29, 2008 5:51 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jefflowrey wrote: |
jefflowrey wrote: |
my own guess was entirely wrong |
 |
To paraphrase a well known tv show:
"Oh my God I've killed jefflowrey!"  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|