MQSeries.net :: View topic - Another database problem, "the WMB is slow"

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Another database problem, "the WMB is slow"

Goto page 1, 2 Next

Another database problem, "the WMB is slow"

« View previous topic :: View next topic »

Author

Message

PeterPotkay

Posted: Mon Mar 10, 2014 11:09 am Post subject: Another database problem, "the WMB is slow"

Poobah

Joined: 15 May 2001
Posts: 7723

You got a Broker with lots of execution groups.
Each execution group got lots of flows, many with database calls.
The database calls are typically embedded in a fat compute node, with lots of ESQL code before and after the database call.

More than once it turned out that the slow responses from WMB was traced back to a slow back end database. Frequently we just 'know' its gonna be the database, but its difficult to prove this quickly.

So in a Production environment with dozens of EGs and hundreds of flows, how do you guys zero in on database performance problems?

Using WMB Exits or Accounting and Statistics doesn't get granular enough if the database call is sandwiched between 100s of lines of ESQL, as those methods only go down to the node level.

ODBC trace or User trace? In a busy production environment with hundreds of flows you're drowning in data in seconds, and you risk slowing things down even more.

Do you keep database calls in their own nodes, with nothing else in that node, so a trace node before and after the database only node makes it easy to spot by looking at those time stamps?

Maybe just always coding a time stamp right before and right after the DB call, always writing to a 'DatabaseTiming' log file? Ugh, sounds horrible, but what are the alternatives? Actually, wasn't there some posts here a few weeks ago where it was shared that timestamps taken throughout a compute node all return the same value anyway? Can someone share that link if I'm not imagining it.

What's the easiest way to prove one of the remote databases is slow in a busy Broker environment?
_________________
Peter Potkay
Keep Calm and MQ On

mqjeff

Posted: Mon Mar 10, 2014 11:18 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Well, I've not tried this.

But you might be able to cheat monitoring events, by propagating to an unconnected terminal, and attach an event to the out terminal being called.

This does mean you would have to instrument the ESQL to put a propagate before and after the DB call. But particularly at v9, you can attach a monitoring profile without much effort, so once the code was instrumented, you could turn on the monitoring just long enough and then turn it off again.

But I thought some of this was answered by Resource Statistics? Maybe I'm misremembering.

rbicheno

Posted: Tue Mar 11, 2014 12:18 am Post subject:

Apprentice

Joined: 07 Jul 2009
Posts: 43

For simple flows the methods you state work well (stats and/or trace)
For more complex flows with lots of ESQL as described its a tricky one with out changing the flow. If i were to change it i would put the DB call in its own compute node then use accounting and stats to show the hotspot.
Without changing the flow...any way you can replicate the DB call external to broker in a simple app? (java app, or even just build another flow that just does the DB call)?
That way if you can show the DB call sucks outside WMB you have your proof. Probably easy if its a select but not so if its an insert/update.
Otherwise are the DBA's no use? I would have thought they can provide some performance stats on your calls and give some feedback, maybe there's some monitoring/tuning their side that can be done.

PeterPotkay

Posted: Tue Mar 11, 2014 4:17 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

rbicheno wrote:

If i were to change it i would put the DB call in its own compute node then use accounting and stats to show the hotspot.
Without changing the flow.

Are there negative performance implications to moving the DB call out to its own Compute Node? For the 99.99999% of the time when everything is fine is the overhead of using multiple compute nodes instead of one worth it?

rbicheno wrote:

That way if you can show the DB call sucks outside WMB you have your proof. Probably easy if its a select but not so if its an insert/update.
Otherwise are the DBA's no use? I would have thought they can provide some performance stats on your calls and give some feedback, maybe there's some monitoring/tuning their side that can be done.

Yeah, eventually. But just like every other area, including the WMB gang, everyone says "Not me, it must be your piece that is slow" until you can prove its their piece that is slow.
_________________
Peter Potkay
Keep Calm and MQ On

PeterPotkay

Posted: Tue Mar 11, 2014 5:09 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Please vote for this Request for Enhancement (RFE) I just submitted to make it easier to see how long WMB takes to talk to a database.

This link will take you to the direct link to vote:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=46015

Headline:
Provide statistics on how long a database interaction takes from inside a Compute node, without having to Trace

Description:
When a database call is made inside a compute node, with many lines of non database related ESQL code before and after the database call, we would like WMB to expose information on how long its taking for that database call to complete. Accounting & Statistics, and Exits, only get us down to the node level. You can tell the compute node is the slow portion, but can't prove the database call is the culprit. Using trace is tough in a busy Production environment - it produces tremendous amounts of data, and may cause its own performance problems. We need the ability to quickly have access to database timings for a specific flow in a specific execution group, without having to restart or redeploy anything.

Use case:
The fire alarm is pulled. "WMB is slow!!!" We know it's almost surely the call to the database, but how do we prove it? We have dozens of executions groups with hundreds of message flows, processing thousands of transactions a minute. Turning on trace is not feasible. There is concern the trace will cause its own performance impacts. There is concern that a trace will simply be millions of lines of output showing what all the flows on the broker are doing, making it next to impossible to see what we need to see. There is concern the trace output is meant for L2 and L3 support, and won't be useful to us.

I would like the ability to run an mqsi* command where I specify the particular message flow that causes WMB to start producing Statistics and / or puts out a very targeted log file that shows start and end times for the database calls made by that one message flow, along with success/failure codes. And optionally showing the content of the request to the DB and the reply from the DB.

Private Business justification:
We waste hours trying to prove WMB is NOT the issue, that WMB is the victim of a slow database. Sometimes the problem mysteriously goes away, no one admits doing anything, and the problem is closed out as WMB was slow and started working better on its own.

We need a way to quickly identify external resources as being the slow component so that we don't waste hours trying to prove this, and so that WMB's reputation is not taking a hit.
_________________
Peter Potkay
Keep Calm and MQ On

Tibor

Posted: Wed Mar 12, 2014 1:15 am Post subject:

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

Peter, I have voted for your RFE, but ... I guess, it would be easier to collect this information on database side.

smdavies99

Posted: Wed Mar 12, 2014 1:50 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Tibor wrote:

Peter, I have voted for your RFE, but ... I guess, it would be easier to collect this information on database side.

In my experience, getting this sort of data is harder than robbing Ft Knox.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

Tibor

Posted: Wed Mar 12, 2014 2:03 am Post subject:

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

Quote:

In my experience, getting this sort of data is harder than robbing Ft Knox.

Are you cross with database guys?

smdavies99

Posted: Wed Mar 12, 2014 2:52 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Tibor wrote:

Quote:

In my experience, getting this sort of data is harder than robbing Ft Knox.

Are you cross with database guys?

no. Just being realistic and speaking from (slightly bitter)experience. DB Admins do not like having their operations questioned especially with regard to performance. If you do this you might find that your access has been mysteriously revoked and restoring it may take a month of Sundays.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

PeterPotkay

Posted: Wed Mar 12, 2014 3:24 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

And even if you get someone cooperative, they may not be the most skilled DBA. Until the DBA, regardles of skill, can find the problem and start working on it, the focus remains on it being a "WMB problem".

The better methods we have to quickly and safely demonstrate the WMB flow is waiting on the database, the faster the focus shifts to the real source of the problem, meaning we all get to drop off the crisis call and get back to real work.

Thank you for voting for the RFE.
_________________
Peter Potkay
Keep Calm and MQ On

Esa

Posted: Wed Mar 12, 2014 3:52 am Post subject:

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

I voted, too.

Interesting to see if this gets implemented.

Compute node.datasourceOperationStart/End monitoring events would be nice to have.

Tibor

Posted: Wed Mar 12, 2014 4:01 am Post subject:

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

[quote="smdavies99"][quote="Tibor"]

Quote:

Just being realistic and speaking from (slightly bitter)experience.

I wish you a slightly better experience. Blaming each other is not the best adventure in an IT department.

Quote:

The better methods we have to quickly and safely demonstrate the WMB flow is waiting on the database...

Of course, you are right, that's why I supported this RFE.

slonkoski

Posted: Wed Mar 12, 2014 5:45 am Post subject:

Acolyte

Joined: 18 Mar 2005
Posts: 52

Got my vote after just spending almost 7 hours listening to DBA and app team swear 100% it couldn't be the DB call - but it was!

JosephGramig

Posted: Wed Mar 12, 2014 7:29 am Post subject:

Grand Master

Joined: 09 Feb 2006
Posts: 1244
Location: Gold Coast of Florida, USA

slonkoski wrote:

Got my vote after just spending almost 7 hours listening to DBA and app team swear 100% it couldn't be the DB call - but it was!

In my experience, it is always an external call from broker that makes WMB appear slow.

kimbert

Posted: Wed Mar 12, 2014 8:04 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

I got sufficiently interested to ask one of the IIB developers about this. Here's their answer:

Quote:

The better IIB answer here is the ODBC activity log that was added in V9.

An SQLExecute is now surrounded by a BIP12074 and BIP12075 (successful) or BIP12077 (unsuccessful).
Therefore it should be possible to view the two timestamps and see exactly how long the DB statement took to execute.

We also do the same for SQLPrepare using the BIP12080 and BIP12081 (successful) or BIP12082 (unsuccessful)

Activity logging is always on and can be logged to a file if needed.

Definitely available in IIB v9. I believe it's also available in any version after WMB v8.0.0.2.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Another database problem, "the WMB is slow"

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP