MQSeries.net :: View topic - Need an advice on this topology

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need an advice on this topology

Goto page 1, 2, 3 Next

Need an advice on this topology

« View previous topic :: View next topic »

Author

Message

yalmasri

Posted: Tue Nov 19, 2013 2:29 am Post subject: Need an advice on this topology

Centurion

Joined: 18 Jun 2008
Posts: 110

A client approached us for migrating their 6.1 broker to 7 (some consultant told them 8 is not stable enough!). They provided us 4 LPARs, each hosting AIX 7.1 - 64 bit on P7 machines.

In the old setup they created over 100 EGs, one for each mini application they have, with an overall of 700 flows. The EGs were distributed across 5 Brokers on 5 machines in a random manner.

After extended debates, we decided to go like this:

And this is why:

Any solid facts for not going down this road?

zpat

Posted: Tue Nov 19, 2013 2:36 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

Do not even consider putting 700 flows in a single EG. You will overload the JVM and if it fails then you lose all 700 flows.

We've also seen situations when one errant flow caused the whole EG to become unresponsive to commands, forcing us to terminate the EG process. Group related flows together and aim for about 10-20 EGs.

One EG is one process, so for multi-processor exploitation - several EGs are desirable anyway.

We use WLM on AIX to assign different workload classes based on the EG which allows us to control relative CPU usage.

You may not have any stated requirements for isolation - but implementing a stable, reliable and manageable system is always a requirement for any serious professional.

Multiple brokers are perhaps less necessary, but it can be useful to limit the scope of changes. Trying to get everyone to agree a change when all the eggs are in one basket can be a nightmare.

This advice is based on using WMB for 13 years - but is not an "official" point of view, and I don't work for IBM.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

yalmasri

Posted: Tue Nov 19, 2013 3:56 am Post subject:

Centurion

Joined: 18 Jun 2008
Posts: 110

Thanks zpat.

zpat wrote:

Do not even consider putting 700 flows in a single EG. You will overload the JVM and if it fails then you lose all 700 flows.

Why you assumed that I want to feed the poor JVM with few resources and regret watching it dying? You can always control JVM resources at startup and make sure it's not running low at any. Moreover, how do you think Application Servers work? They're only one JVM and they take all your services.

zpat wrote:

We've also seen situations when one errant flow caused the whole EG to become unresponsive to commands, forcing us to terminate the EG process. Group related flows together and aim for about 10-20 EGs.

Not sure what these situations are, but in multi-core multi-CPU environments, one flow (assuming one instance) will only clutch one core in worst cases leaving the other threads running. Processes (=EGs) are nothing but entities your OS uses to control the use of system resources. Threads however, are the actual processor-time consumers. I think modern OSs are smart enough to prevent culprit threads to sabotage your process. What was the OS and Broker versions within which these situations occurred?

On the other hand, couldn't that errant flow equally bolted your Broker itself? Or maybe the whole machine?! Why always EGs are the prey?

zpat wrote:

One EG is one process, so for multi-processor exploitation - several EGs are desirable anyway.

Are you saying that multi-threaded multi-process applications can exploit system resources better than multi-threaded single-process applications? Let me know because I have cases in Broker Performance Reports that tell you otherwise

McueMart

Posted: Tue Nov 19, 2013 4:32 am Post subject:

Chevalier

Joined: 29 Nov 2011
Posts: 490
Location: UK...somewhere

As another person with a good few years WMB experience, I would echo zpat's sentiments that putting 700 flows in one EG is a terrible idea.

There are just too many reasons to list, but one off the top of my head is that if you modify a configurable service, many of them require the EG to be reloaded to pick up the change. Do you know how long re-starting 700 flows will take? Unless you have some kind of monster machine, I would predict a very long time.

zpat

Posted: Tue Nov 19, 2013 4:52 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5867
Location: UK

One mistake people often make is treating WMB as a sort of WAS server.

I've seen developers using Java when WMB functions are there to be used.

But if you don't want the benefit of our experience, why ask for it?
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

rbicheno

Posted: Tue Nov 19, 2013 5:33 am Post subject:

Apprentice

Joined: 07 Jul 2009
Posts: 43

Personally i would be very uncomfortable putting 700 flows in 1 EG.
The other guys are spot on. I used to work for IBM, infact i used to write the perf reports and i never tested near that number for a single EG on v7. (ofcourse since i left they may have!)
Admin and deployment performance to an EG of that size and also isolation of flows and would be a worry for me, I have also seen cases where one badly written flow could take down the EG.
Are they doing this to reduce license cost by using the edition that is limited to 1EG per broker?
Also i would question why they dont goto 8 or even v9. They are going to have to do a huge amount of testing anyway so why need to repeat that again for another migration in another couple of years.
I do like the clustering though and sharing load over a number identical brokers, i would just split the brokers into more EG's. How are you going to ensure all brokers are kept in sync i.e. deployments?

Vitor

Posted: Tue Nov 19, 2013 5:41 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

yalmasri wrote:

Moreover, how do you think Application Servers work? They're only one JVM and they take all your services.

Neither WMB nor an EG is WAS. The EG does not handle the JVM the same way that WAS does,

yalmasri wrote:

You can always control JVM resources at startup and make sure it's not running low at any

And if you see it running low what are you going to do? Change the resource profile in the WAS admin console broker doesn't have? That kind of change requires a restart. and 700 flows starting in a single group is going to take a while.

yalmasri wrote:

What was the OS and Broker versions within which these situations occurred?

Linux, Solaris & AIX on v6.1 & v7. None of these OS react the way you claim under all circumstances.

yalmasri wrote:

Let me know because I have cases in Broker Performance Reports that tell you otherwise

Post a link.

yalmasri wrote:

1. Creating no more than one Broker for each machine is the most efficient
2. We stuck to one EG because the client did not have any application separation requirements
3. Replicating all flows in all 4 Brokers will enhance maintainability as now there's no need to keep a list of flows' locations
4. Managing workload on servers is no longer needed because this setup guarantees similar resource consumption across machines and time (well, almost)
5. Achieving very high availability to all flows with cheap cost (what's the cost of deploying another flow?!)
6. Optimizing DB connections by creating only one pool serving all flows inside the sole EG per Broker
7. Having only one EG will lower memory and CPU consumption associated with the EG itself
Any solid facts for not going down this road?

1. Define "efficient". You can't mean it takes longer to build 2 brokers than it does one. It's exactly the same point as how many queue managers do you have on a single server?

2. There's more reasons than application separation (like administrative control of broker) to have multiple EGs

3. The kind of automated build & deploy tool you need to be using if you have 700 flows to keep versioned and controled will easily manage flow location.

4. Yes, "almost". You're gambling that the workload distribution at the transport level is even, that all flows take about the same time & resource and the OS is smart enough to manage resources via additional instances without you dropping a hint in the form of an additional EG (process) with another copy of the flow in it.

5. This has nothing to do with HA & this point is spurious. At best, you're relying on the transport distribution again.

6. This isn't WAS. WMB doesn't handle DB connections like that.

7. Cite your sources, with especial reference to how 700 running flows use significantly less CPU and memory resources than the same flows in 10 EGs 70 to an EG. Be clear on how this significantly lower resource cost more than makes up for the additional administrative and control overhead

yalmasri wrote:

Any solid facts for not going down this road?

Put 700 flows in a single EG. Stop the broker & issue an mqsisetdbparms for a non-DB service (or any other change that requires the broker to be stopped). Issue an mqsistart and note the time. Note the time again when the 700th flow shows as running. Compare the difference in these times to the customer's tollerance for outage.
_________________
Honesty is the best policy.
Insanity is the best defence.

yalmasri

Posted: Tue Nov 19, 2013 11:28 pm Post subject:

Centurion

Joined: 18 Jun 2008
Posts: 110

McueMart wrote:

There are just too many reasons to list, but one off the top of my head is that if you modify a configurable service, many of them require the EG to be reloaded to pick up the change. Do you know how long re-starting 700 flows will take? Unless you have some kind of monster machine, I would predict a very long time.

You have a point here because the code is already there and we can't predict forehand what kind of services are we dealing with. Other than this, I'm against projecting code design limitations on how you layer your applications. For that kind of service, you could easily enable some sort of pill message to take care of reloading the configuration.

But still, what's the problem of bouncing that EG when you know that there are other three that can take up the work? I haven't stopped the business, and I can wait as much as it takes.

smdavies99

Posted: Wed Nov 20, 2013 12:01 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

yalmasri wrote:

But still, what's the problem of bouncing that EG when you know that there are other three that can take up the work? I haven't stopped the business, and I can wait as much as it takes.

There speaks someone who has not benefitted from an angry customer quoting his 5 minute/year SLA when the broker takes 30-40 minutes to startup with 700 flows deployed to it.

Seriously, there has been a lot of advice givent in this thread.

Therefore why not do the following

1) Get all flows to work on V7 (you should really be using V8.0.0.2 as a minimum these days otherwise someone else is going to have to upgrade again in 12 months or so)
2) create a number of scripts that create and deploy all the flows into different scenarios
3) Test the hell out of it with respect to the following
a) initial broker startup from nothing to everything working
b) EG restart/abend
c) Time to deploy an updated flow to the running system
d) Anything else you can think of.

4) write a report showing the different permutations and how each will affect the customer business especially wrt to an EG abend.

Then you can make a reasoned decision based upon the real flows and data.
I've seen a V7 broker take 45-minutes to fully start. Ok that depends upon a lof ot the types of flows in the setup but you really have to ask youself what is right for the customer. How are they going to manage this once I'm gone.

Do it right and hey, who knows you might even get invited back to do the V9 upgrade. Do it wrong and some other poor sod will have to pick up the pieces.

If you are allowed to publish the results with all cutomer data/information obfuscated. I am sure it would provide interesting reading to a good few of us.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

yalmasri

Posted: Wed Nov 20, 2013 12:02 am Post subject:

Centurion

Joined: 18 Jun 2008
Posts: 110

zpat wrote:

One mistake people often make is treating WMB as a sort of WAS server.

I'm not saying WMB is sort of WAS, what I'm saying is that if WAS is capable of flexing its muscles while still sitting on one system process, then why WMB with a single EG can't? I'm only borrowing WAS here to make an analogy from system representation point of view, not anything else.

zpat wrote:

But if you don't want the benefit of our experience, why ask for it?

This is not a linguistics forum, right? So I must be here for a good reason.

This is a very important topic that many people are talking about but only few references are mentioning it. Feeding this thread with people's experience is a key in shaping it, and thus your contribution is valuable, irrespectively.

Discussing internals doesn't mean I'm rejecting your case, it's only to explore why you ran into this situation, and whether there's a remedy to it aside from just trying to isolate it and live with it.

yalmasri

Posted: Wed Nov 20, 2013 12:39 am Post subject:

Centurion

Joined: 18 Jun 2008
Posts: 110

rbicheno wrote:

infact i used to write the perf reports and i never tested near that number for a single EG on v7. (ofcourse since i left they may have!)
Admin and deployment performance to an EG of that size and also isolation of flows and would be a worry for me

Flows used in those reports are very simple and they are not a good reference for common life flows. But is WMB really designed to hold a limited number flows? Is it only administration semantics that I should consider when relying on just one EG or there are other runtime considerations?

rbicheno wrote:

I have also seen cases where one badly written flow could take down the EG.

So increasing the number of EGs is just a way to minimize the effect of this situation not solving it? And if I'm to follow you on this, then don't I better have every flow in one EG?

Still though, I don't understand how one thread can bring down your process. Is a process really such a brittle thing?

BTW, there's no way to correct this situation except powering off and on your EG?

rbicheno wrote:

Are they doing this to reduce license cost by using the edition that is limited to 1EG per broker?

I'll be feeling flattered to achieve this for my client, but no. My client is one of the richest teleco operators in the ME. And the reason is purely related to the benefits I outlined at the beginning of this thread.

rbicheno wrote:

How are you going to ensure all brokers are kept in sync i.e. deployments?

Automated scripts

yalmasri

Posted: Wed Nov 20, 2013 2:15 am Post subject:

Centurion

Joined: 18 Jun 2008
Posts: 110

smdavies99 wrote:

yalmasri wrote:

But still, what's the problem of bouncing that EG when you know that there are other three that can take up the work? I haven't stopped the business, and I can wait as much as it takes.

There speaks someone who has not benefitted from an angry customer quoting his 5 minute/year SLA when the broker takes 30-40 minutes to startup with 700 flows deployed to it.

You're saying this because you didn't know this customer. But I'm talking about a different point here; we essentially don't have a downtime at all because of the other 3 Brokers hooked to the cluster. I have the flexibility here.

smdavies99 wrote:

Therefore why not do the following

1) Get all flows to work on V7 (you should really be using V8.0.0.2 as a minimum these days otherwise someone else is going to have to upgrade again in 12 months or so)
2) create a number of scripts that create and deploy all the flows into different scenarios
3) Test the hell out of it with respect to the following
a) initial broker startup from nothing to everything working
b) EG restart/abend
c) Time to deploy an updated flow to the running system
d) Anything else you can think of.

4) write a report showing the different permutations and how each will affect the customer business especially wrt to an EG abend.

Then you can make a reasoned decision based upon the real flows and data.

This is the way to go.

smdavies99 wrote:

If you are allowed to publish the results with all cutomer data/information obfuscated. I am sure it would provide interesting reading to a good few of us.

Yea, why not

yalmasri

Posted: Wed Nov 20, 2013 2:24 am Post subject:

Centurion

Joined: 18 Jun 2008
Posts: 110

Hey Peter, I'm particularly interested in your opinion

rbicheno

Posted: Wed Nov 20, 2013 2:26 am Post subject:

Apprentice

Joined: 07 Jul 2009
Posts: 43

I am no longer with IBM and I'm now a WMB customer owning and running an WMB estate for a Bank.
If a consultancy came in and proposed to put 700 flows in 1 EG as you suggest it would clearly tell me they have no practical experience with WMB in the real world.
Your proposals have some merit in theory but in practice i donâ€™t think it would work. Some reasons...

Flows in an EG are started sequentially, so starting one EG with 700 flows will take some time as already pointed out this will be needed if you need to run mqsisetdbparms etc. Say 3.5 seconds a flow which is reasonable may mean a total restart time of ~40mins. All it takes is another party for example to change a datasource password without telling you and you have a prod outage as you cant connect to the db which will take you 40mins to recover from! With broker you are often not incontrol of many of the connecting apps so I would want to keep my ability to react as agile as possible!

Fact of life things break...a badly written flow stuck in a constant loop, an unexpectedly large input message blowing your process memory are just 2 simple examples i have seen. You are right partitioning in more EG's doesnot fix the underlying issue but minimises impact. Certainly makes me sleep easier at night!

With your design using clustering if you get misbehaving app sending bad messages it could send to all brokers so they could be taken down simultaneously with a restart time of 40mins I wouldnâ€™t be a happy customer. In a perfect world all apps would behave well and all flows bug free but i would bet you donâ€™t have 700 perfect flows!

I like the earlier suggestion that you should run some tests of your proposal and let us know your results, i'm always keen to learn from others experiences.

mqjeff

Posted: Wed Nov 20, 2013 3:39 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If you really want to run all 700 flows in a single execution group, despite everyone else telling you that there are probably reasons you shouldn't, then you should still set up two execution groups on each box - and both of them should have all of the flows.

Even with four active nodes, there can easily be reasons why you'd need to restart all four of them, and having two EGs on each node really does allow you to stagger the restarts whilst still maintaining uptime.

That said, you are trying to conserve the resources of the overhead of additional EGs, but compared to 700 flows, the overhead is not very big! So you're saving a penny, and possibly costing a dollar to do so.

At a certain point, each flow is a separate thread within the process. At a certain point, the time spent swapping state between threads within the process exceeds the reasonable manageability of the OS, and you spend more time context-switching than processing real work. This depends a lot on the OS and other factors, but it does end up affecting how many flows you "should" stick into a single EG.

So, again, if you really want to stick all 700 flows in a single EG, you should do some measurements to make sure that you're not spending more time managing the flows than you are executing work within them.

Display posts from previous:

Goto page 1, 2, 3 Next

Page 1 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Need an advice on this topology

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP