Author |
Message
|
PeterPotkay |
Posted: Mon Mar 10, 2014 11:09 am Post subject: Another database problem, "the WMB is slow" |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
You got a Broker with lots of execution groups.
Each execution group got lots of flows, many with database calls.
The database calls are typically embedded in a fat compute node, with lots of ESQL code before and after the database call.
More than once it turned out that the slow responses from WMB was traced back to a slow back end database. Frequently we just 'know' its gonna be the database, but its difficult to prove this quickly.
So in a Production environment with dozens of EGs and hundreds of flows, how do you guys zero in on database performance problems?
Using WMB Exits or Accounting and Statistics doesn't get granular enough if the database call is sandwiched between 100s of lines of ESQL, as those methods only go down to the node level.
ODBC trace or User trace? In a busy production environment with hundreds of flows you're drowning in data in seconds, and you risk slowing things down even more.
Do you keep database calls in their own nodes, with nothing else in that node, so a trace node before and after the database only node makes it easy to spot by looking at those time stamps?
Maybe just always coding a time stamp right before and right after the DB call, always writing to a 'DatabaseTiming' log file? Ugh, sounds horrible, but what are the alternatives? Actually, wasn't there some posts here a few weeks ago where it was shared that timestamps taken throughout a compute node all return the same value anyway? Can someone share that link if I'm not imagining it.
What's the easiest way to prove one of the remote databases is slow in a busy Broker environment? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Mar 10, 2014 11:18 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Well, I've not tried this.
But you might be able to cheat monitoring events, by propagating to an unconnected terminal, and attach an event to the out terminal being called.
This does mean you would have to instrument the ESQL to put a propagate before and after the DB call. But particularly at v9, you can attach a monitoring profile without much effort, so once the code was instrumented, you could turn on the monitoring just long enough and then turn it off again.
But I thought some of this was answered by Resource Statistics? Maybe I'm misremembering. |
|
Back to top |
|
 |
rbicheno |
Posted: Tue Mar 11, 2014 12:18 am Post subject: |
|
|
Apprentice
Joined: 07 Jul 2009 Posts: 43
|
For simple flows the methods you state work well (stats and/or trace)
For more complex flows with lots of ESQL as described its a tricky one with out changing the flow. If i were to change it i would put the DB call in its own compute node then use accounting and stats to show the hotspot.
Without changing the flow...any way you can replicate the DB call external to broker in a simple app? (java app, or even just build another flow that just does the DB call)?
That way if you can show the DB call sucks outside WMB you have your proof. Probably easy if its a select but not so if its an insert/update.
Otherwise are the DBA's no use? I would have thought they can provide some performance stats on your calls and give some feedback, maybe there's some monitoring/tuning their side that can be done. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Mar 11, 2014 4:17 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
rbicheno wrote: |
If i were to change it i would put the DB call in its own compute node then use accounting and stats to show the hotspot.
Without changing the flow. |
Are there negative performance implications to moving the DB call out to its own Compute Node? For the 99.99999% of the time when everything is fine is the overhead of using multiple compute nodes instead of one worth it?
rbicheno wrote: |
That way if you can show the DB call sucks outside WMB you have your proof. Probably easy if its a select but not so if its an insert/update.
Otherwise are the DBA's no use? I would have thought they can provide some performance stats on your calls and give some feedback, maybe there's some monitoring/tuning their side that can be done. |
Yeah, eventually. But just like every other area, including the WMB gang, everyone says "Not me, it must be your piece that is slow" until you can prove its their piece that is slow. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Mar 11, 2014 5:09 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Please vote for this Request for Enhancement (RFE) I just submitted to make it easier to see how long WMB takes to talk to a database.
This link will take you to the direct link to vote:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=46015
Headline:
Provide statistics on how long a database interaction takes from inside a Compute node, without having to Trace
Description:
When a database call is made inside a compute node, with many lines of non database related ESQL code before and after the database call, we would like WMB to expose information on how long its taking for that database call to complete. Accounting & Statistics, and Exits, only get us down to the node level. You can tell the compute node is the slow portion, but can't prove the database call is the culprit. Using trace is tough in a busy Production environment - it produces tremendous amounts of data, and may cause its own performance problems. We need the ability to quickly have access to database timings for a specific flow in a specific execution group, without having to restart or redeploy anything.
Use case:
The fire alarm is pulled. "WMB is slow!!!" We know it's almost surely the call to the database, but how do we prove it? We have dozens of executions groups with hundreds of message flows, processing thousands of transactions a minute. Turning on trace is not feasible. There is concern the trace will cause its own performance impacts. There is concern that a trace will simply be millions of lines of output showing what all the flows on the broker are doing, making it next to impossible to see what we need to see. There is concern the trace output is meant for L2 and L3 support, and won't be useful to us.
I would like the ability to run an mqsi* command where I specify the particular message flow that causes WMB to start producing Statistics and / or puts out a very targeted log file that shows start and end times for the database calls made by that one message flow, along with success/failure codes. And optionally showing the content of the request to the DB and the reply from the DB.
Private Business justification:
We waste hours trying to prove WMB is NOT the issue, that WMB is the victim of a slow database. Sometimes the problem mysteriously goes away, no one admits doing anything, and the problem is closed out as WMB was slow and started working better on its own.
We need a way to quickly identify external resources as being the slow component so that we don't waste hours trying to prove this, and so that WMB's reputation is not taking a hit. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Tibor |
Posted: Wed Mar 12, 2014 1:15 am Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
Peter, I have voted for your RFE, but ... I guess, it would be easier to collect this information on database side. |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Mar 12, 2014 1:50 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Tibor wrote: |
Peter, I have voted for your RFE, but ... I guess, it would be easier to collect this information on database side. |
In my experience, getting this sort of data is harder than robbing Ft Knox. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
Tibor |
Posted: Wed Mar 12, 2014 2:03 am Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
Quote: |
In my experience, getting this sort of data is harder than robbing Ft Knox. |
Are you cross with database guys?  |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Mar 12, 2014 2:52 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Tibor wrote: |
Quote: |
In my experience, getting this sort of data is harder than robbing Ft Knox. |
Are you cross with database guys?  |
no. Just being realistic and speaking from (slightly bitter)experience. DB Admins do not like having their operations questioned especially with regard to performance. If you do this you might find that your access has been mysteriously revoked and restoring it may take a month of Sundays. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Mar 12, 2014 3:24 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
And even if you get someone cooperative, they may not be the most skilled DBA. Until the DBA, regardles of skill, can find the problem and start working on it, the focus remains on it being a "WMB problem".
The better methods we have to quickly and safely demonstrate the WMB flow is waiting on the database, the faster the focus shifts to the real source of the problem, meaning we all get to drop off the crisis call and get back to real work.
Thank you for voting for the RFE. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Esa |
Posted: Wed Mar 12, 2014 3:52 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
I voted, too.
Interesting to see if this gets implemented.
Compute node.datasourceOperationStart/End monitoring events would be nice to have. |
|
Back to top |
|
 |
Tibor |
Posted: Wed Mar 12, 2014 4:01 am Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
[quote="smdavies99"][quote="Tibor"]
Quote: |
Just being realistic and speaking from (slightly bitter)experience. |
I wish you a slightly better experience. Blaming each other is not the best adventure in an IT department.
Quote: |
The better methods we have to quickly and safely demonstrate the WMB flow is waiting on the database... |
Of course, you are right, that's why I supported this RFE. |
|
Back to top |
|
 |
slonkoski |
Posted: Wed Mar 12, 2014 5:45 am Post subject: |
|
|
 Acolyte
Joined: 18 Mar 2005 Posts: 52
|
Got my vote after just spending almost 7 hours listening to DBA and app team swear 100% it couldn't be the DB call - but it was! |
|
Back to top |
|
 |
JosephGramig |
Posted: Wed Mar 12, 2014 7:29 am Post subject: |
|
|
 Grand Master
Joined: 09 Feb 2006 Posts: 1244 Location: Gold Coast of Florida, USA
|
slonkoski wrote: |
Got my vote after just spending almost 7 hours listening to DBA and app team swear 100% it couldn't be the DB call - but it was! |
In my experience, it is always an external call from broker that makes WMB appear slow. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Mar 12, 2014 8:04 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I got sufficiently interested to ask one of the IIB developers about this. Here's their answer:
Quote: |
The better IIB answer here is the ODBC activity log that was added in V9.
An SQLExecute is now surrounded by a BIP12074 and BIP12075 (successful) or BIP12077 (unsuccessful).
Therefore it should be possible to view the two timestamps and see exactly how long the DB statement took to execute.
We also do the same for SQLPrepare using the BIP12080 and BIP12081 (successful) or BIP12082 (unsuccessful)
Activity logging is always on and can be logged to a file if needed. |
Definitely available in IIB v9. I believe it's also available in any version after WMB v8.0.0.2. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
|