MQSeries.net :: View topic - Challenge Question

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Challenge Forum » Challenge Question - 02 / 2008

This forum is locked: you cannot post, reply to, or edit topics.

This topic is locked: you cannot edit posts or make replies.

Challenge Question - 02 / 2008

« View previous topic :: View next topic »

Author

Message

Mehrdad

Posted: Sun Feb 03, 2008 12:45 pm Post subject: Challenge Question - 02 / 2008

Master

Joined: 27 Feb 2004
Posts: 219
Location: Europe

Here comes our February 2008 Challenge Question

"There is a cluster called SALES, with 9 queue managers in it, all running MQ 6.0.1. Queue Managers A, B & C sevice the inbound requests, queue managers 1 - 6 process the requests and transmit replies. Applications connect by client to the A - C managers and create a temporary dynamic queue to receive their particular reply.

The applications are grouped into geographic groups, and use 3 different client tables to make a connection by default to their nearest queue manager but any application instance can use any queue manager A - C. Typically each of these queue managers will have 75-100 clients connected at any given time.

Queue Managers 1 - 6 are on Solaris machines, the box hosting queue manager 6 is significantly more powerful than the others and the CLWLWGHT parameter on this box has been set to 75 to reflect this. All the other CLWLWGHT on 1-5 are set at 25. All the machines 1-6 host the same set of queues. The FRs in the cluster are queue manager A and queue manager 6.

You (as MQ admin) receive a call from the Solaris admins complaining that boxes 1 -5 are heavily loaded and running slowly, and though box 6 is processing messages it's not considered to be pulling it's weight in terms of message volume and has spare capacity.

What action(s) do you take?"

Answers are encouraged to be posted here, yet for the one(s) who would like to remain discrete some you can send your answer to challengefeb2008@cressida.info .

Vitor

Posted: Mon Feb 04, 2008 2:33 am Post subject: Re: Challenge Question - 02 / 2008

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Mehrdad wrote:

You (as MQ admin) receive a call from the Solaris admins complaining that boxes 1 -5 are heavily loaded and running slowly, and though box 6 is processing messages it's not considered to be pulling it's weight in terms of message volume and has spare capacity.

What action(s) do you take?"

I'll throw this one in before someone else does:

- tell the Unix admins not to be such cry-babies
- remind them that extra resource is cheap
- point out that if they can't get the budget to buy it that's not your problem
- make yourself a coffee

Optionally you could point out that if they replaced machines 1-6 with a proper computer (i.e. a mainframe) they'd have bags of power. If that hasn't got rid of them, start telling stories of the time the 80 column card punch broke and you had to program with a pair of scissors.....

_________________
Honesty is the best policy.
Insanity is the best defence.

AkankshA

Posted: Mon Feb 04, 2008 3:17 am Post subject: My 1 cent...

Grand Master

Joined: 12 Jan 2006
Posts: 1494
Location: Singapore

did i hit the nail or am i shooting on the other wall itself

QM 6 is an FR with CLWLWGHT as 75 & QM 1-5 are PR with CLWLWGHT as 25
When 3 messages are sent on QM6 channel only only 1 would be sent to rest of channels.

QM6 A full repository ll push its information via cluster-sender channel to QM A another full repository's cluster-receiver channel.

Hence the load QM A and QM 6 would already be high (being full repositories) with regular update msgs(which does not eat up much of the unix box capacity).

Increase the CLWLWGHT ratio ...!!
_________________
Cheers

fjb_saper

Posted: Sun Feb 10, 2008 6:06 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

Looks to me like the priority was set at the channel level.
This has some disadvantages: If you are aiming for messages hitting a specific queue/ process you may not get the desired result as it will take into consideration all messages transiting the channel.

This is why I would suggest to set the CLWLPRTY attribute of the destination queue in the cluster. Set it to 7 on the qmgr you want to favor and to 3 on all others.
I have found this to be particularly effective if you get messages delivered in batches. At some point the (main) target queue becomes unavailble (mca agent putting to the queue) and the other destinations in the cluster then act as a failover. Polling ensures that the main receives messages as soon as it is available again... Thus the mechanism acts more as a load balance then as a failover... (failover mode for me was using 9 and 0 values).

Looking at how the problem was presented the first thing I would need to check is the health of the cluster channel towards QM6 from QMA,QMB, and QMC. If it is shown as running, I would stop it and restart it, enabling a reset of the statistics so that the balance gets restored...

Making a different qmgr an FR and QM6 a PR could also improve on the message distribution.

Ultimately I would put the balancing at Q level for the important heavy duty processing messages.
_________________
MQ & Broker admin

Gomez Addams

Posted: Mon Feb 11, 2008 2:26 pm Post subject:

Newbie

Joined: 11 Feb 2008
Posts: 1
Location: Minneapolis

Assuming that the workload should be distributed over the six servers as evenly as possible given the MQ parameters available and without writing code, I think that further refinement of CLWLWGHT is in order. It is impossible to tell what the refinement should be without knowing more about the size of the processing disparity and what would be considered a properly balanced load in this circumstance, given an unbalanced processing environment (i.e., one processor is significantly faster than the other five).

Assuming that work should be sent to the fastest processer first until it ceases to be fastest and only then distributed among the other servers, use CLWLPRTY to give the queues on the QM6 higher priority than on the other QMs. Of course, the system running QM6 will be useless for anything else.

Assuming that anything goes and any amount of resources must be expended to produce the perfect balance and coordination of work, a cluster workload exit can be written to favor QM6 and use any number of indicators not normally available to MQ to determine if QM6 is getting busy and traffic should be directed to other QMs.
_________________
Run this program on your system. Experiment with leaving out parts of the program, to see what error messages you get. -- Kernighan and Ritchie

jefflowrey

Posted: Mon Feb 11, 2008 2:43 pm Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

I think these challenges are written to be as complete in themselves as possible - i.e. that there's no "missing" information needed to figure out what's going on.

There is usually missing information about the problem itself, of course.

This one is particularly interesting, because if I'm right in my own guess, then one has to understand how clusters work to know why the behavior is being shown.

Of course, my own guess for last month's challenge was entirely wrong - I was thinking something else entirely.
_________________
I am *not* the model of the modern major general.

PeterPotkay

Posted: Mon Feb 11, 2008 3:33 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Assuming that:

*There are no network problems going to QM6 that would be effecting channel availability

*All the queues on QM1 to QM6 are defined exactly the same AND all enabled for MQPUTs

* You don't have any apps specifying QM1-QM5 but not QM6 on the front end MQOPENs / MQPUT1s

I'm inclined to think its related to the fact that QM6 is a FR and so the channels to it are going to have more traffic due to the cluster admin messages.

http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.csqzah.doc/qc10940_.htm

Quote:

Note that the distribution of user messages is not always exact, because administration and maintenance of the cluster causes messages to flow across channels. This can result in an apparent uneven distribution of user messages which can take some time to stabilize. Because of this, no reliance should be made on the exact distribution of messages during workload balancing.

Working as designed!

I know, I know, there must be something else. Cluster admin messages couldn't disrupt the balance that much and a weight of 75 on the channel should mean a lot more app messages should be going to it. If the above assumptions are true I can't imagine what the problem is. Send the Problem Ticket to Network Support!

_________________
Peter Potkay
Keep Calm and MQ On

starship

Posted: Tue Feb 19, 2008 2:25 am Post subject:

Apprentice

Joined: 07 Dec 2005
Posts: 33
Location: INDIA

Hello All,

I think that the Queue Managers B and C are not able to connect to Queue Manager 6 and all the Request from B and C are going to QM 1-5, Whereas the request from A is going to all the QM 1-6.

The reason why QM B and C are not being able to connect with QM 6 might be due to Network Issues b/w QM B,C and QM 6.

Please advise.

Regards

Vitor

Posted: Fri Feb 29, 2008 12:58 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Thanks to all for participating in this month's competition!

IMHO it's generated a lot of interesting and useful discussion on cluster workload balancing and thrown up some good points.

The root cause of the scenario (for the record) was a new version of the client code which connects A-C, being rolled out across the organisation for non-MQ related reasons. This code had been changed to specify BIND_ON_OPEN, and hence for the duration of it's activity used a specific queue on a specific queue manager. This (obviously) eliminates workload distribution and skews the load.

But the question was: What action(s) do you take?. Now many of the actions suggested round network connection and changing CLWLWGHT could have helped, if only by not having the expected action and pushing the investigation elsewhere. But there can be only one winner (apparently, unless there's another special prize) and thus PeterPotkay wins by a nose as the only solution which explictly mentioned checking the application code.

Well done to you sir, please PM Mehrdad with details of where you want your glorious, desirable and quite frankly stunning prize sent.

Thank you all again for your contributions.
_________________
Honesty is the best policy.
Insanity is the best defence.

jefflowrey

Posted: Fri Feb 29, 2008 5:45 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

jefflowrey wrote:

my own guess was entirely wrong

_________________
I am *not* the model of the modern major general.

Vitor

Posted: Fri Feb 29, 2008 5:51 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jefflowrey wrote:

my own guess was entirely wrong

To paraphrase a well known tv show:

"Oh my God I've killed jefflowrey!"

_________________
Honesty is the best policy.
Insanity is the best defence.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Challenge Forum » Challenge Question - 02 / 2008

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP