Author |
Message
|
abd.wsu |
Posted: Thu Jun 01, 2017 8:19 am Post subject: Significance of Times Max Thread Reached in flow stats |
|
|
Acolyte
Joined: 12 Sep 2012 Posts: 65
|
Hi,
We have a flow that is making a HTTP request call to an external webservice and there have been complains of slowness. We turned on the snapshot stats on our broker running on IIB10 and see that Average Elapsed time in the entire flow is about 300ms with the request node taking about 200ms. Now, this doesn't look slow to us and it has been the same for pretty much ever since we started the stats.
However, while looking at the stats from tivoli, I noticed we have unusually high number of `Times Max Threads Reached` count. We are using 6 additonal instances, so the no. of threads is showing as 7, but at certain point, i saw, `Times Max Threads Reached` at 60K. Should this concern me? Is this indicative of anything wrong with the flow?
The slowness resolved on it's own, but we really don't see any difference in the stats during the issue happening and after it was resolved. But i am just trying to cleanup from my side and this huge number caught my eye. Please suggest. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jun 01, 2017 8:26 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
I think "Times Max Threads Reached" means "the number of times that all threads were in use".
That suggests that you should increase the number of instances to better match the volume of incoming messages...
If you can capture it in a lower environment than production, a user trace would be useful. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
Vitor |
Posted: Thu Jun 01, 2017 8:29 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
abd.wsu wrote: |
We are using 6 additonal instances, so the no. of threads is showing as 7, but at certain point, i saw, `Times Max Threads Reached` at 60K. Should this concern me? Is this indicative of anything wrong with the flow? |
If you have 360K messages going through the flow during the period of the snapshot, probably not. If you have 60K, then probably. You don't mention volumes.
What that statistic is telling you is the number of times (during the interval) all of the threads were used. At it's simplest, it's telling you that (based on the flow throughput you don't mention) you're using all of the thread resource you've allocated and therefore there's a possibility of running out if throughput increases.
abd.wsu wrote: |
The slowness resolved on it's own |
I am always highly suspicious of situations where the magic just came back on it's own. It's much more likely that whatever problem was causing the slowdown (and that problem could be external to IIB) was resolved.
I would not want to be in a position where users complain about slowness and my response is "don't worry; it happens sometimes, it'll fix itself in a minute if we just wait patiently" or "oops, out of magic again; I'll just pour some pixie dust in the back of the server" _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
abd.wsu |
Posted: Thu Jun 01, 2017 9:18 am Post subject: |
|
|
Acolyte
Joined: 12 Sep 2012 Posts: 65
|
Thanks for the replies.
@mqjeff, I'll see what i can do in lower environment. Need something to generate close enough load to production. I'll try some options.
@Vitor, sorry i didn't mention the total input messages. Its actually the same number as the Times Max threads reached. at that point. 60K.
We didn't wait for the magic to come back, but followed the 'Bounce everything, hold hands and sing kumbaya' mantra to please the Gods. Yes. We are digging in to see what caused the slowness. There is an LDAP Authentication layer which is being looked at closely. But from my observation, the only thing to indicate any issue is these stats. We don't see any errors in the logs and like i said, the Average Elapsed time on the flow stats is pretty much similar before/after issue. |
|
Back to top |
|
 |
Vitor |
Posted: Fri Jun 02, 2017 4:53 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
abd.wsu wrote: |
We didn't wait for the magic to come back, but followed the 'Bounce everything, hold hands and sing kumbaya' mantra to please the Gods. |
It's an equally valid strategy, and more proactive. Indeed with Windoze, it's the de facto first step. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Fri Jun 02, 2017 4:55 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
abd.wsu wrote: |
@Vitor, sorry i didn't mention the total input messages. Its actually the same number as the Times Max threads reached. at that point. 60K. |
So whatever's causing the slowness, it's clear that you were at the limit of the thread resources. This could cause your customers to receive actual timeouts and connection refused type errors rather than just slow performance. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Jun 02, 2017 5:35 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
I could not find he interval but I looked at the math and this is what it gives
300 ms avg elapsed flow time gives about 3 TPS
3TPS times 7 threads = about 21 TPS.
60 K messages @ 21 TPS = about 48 mins running nearly full tilt...
(60000 *300 /1000 /60 / 7 = 43.8 mins)
If you expected to be done any sooner you have a capacity problem....  _________________ MQ & Broker admin |
|
Back to top |
|
 |
abd.wsu |
Posted: Tue Jun 06, 2017 9:15 am Post subject: |
|
|
Acolyte
Joined: 12 Sep 2012 Posts: 65
|
So @fjb_saper when you say capacity problems, is it within the flow or the eg or the broker and server itself. I checked the RHEL6 vitals at the time of the issue and there were no constraints. At the OS level everything seems to be humming along nicely.
No, what would fix whatever this capacity problem is? Increasing the number of flow instances? |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jun 06, 2017 9:30 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Ok. To review (mainly for my own memory):
- Your flow makes an HTTP request.
- Your flow is receiving 60k messages, and sending out 60k http requests
- during processing of those 60k requests, you are receiving 60k "Times Max Threads Reached"
- You are running 7 instances
If during processing those 60k messages, you get more than about 25% times max threads reached, you are probably running into issues with the server receiving the http requests.
If the goal is to respond to the applications calling your flow as quickly as possible, then you should reduce the timeout on your HTTPRequest node. That way it will timeout faster, and return a response to the calling application saying "I timed out". The application then needs to retry the call.
If the application can *never* receive a timeout response, and have to retry, then a) increase your thread instances, b) get a new job...
You don't say how you are receiving messages to start the flow.
If you are using an HTTP transport node (soap, http, json, etc) then you can adjust the properties of the relevant HTTP connector to get more threads waiting in the http listener than in the flow. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
abd.wsu |
Posted: Tue Jun 06, 2017 10:51 am Post subject: |
|
|
Acolyte
Joined: 12 Sep 2012 Posts: 65
|
Yup. I did the first thing. Reduced the timeout on the HTTP Request node to timeout sooner.
If increasing the instances is gonna cost me my job then maybe don't do it.
I'll research on the last part and see how it will help with my issue. We are using a broker wide http listener. So I am not sure how this would affect the other flows. Let me research/google that. Thanks. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 06, 2017 11:08 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
abd.wsu wrote: |
If increasing the instances is gonna cost me my job then maybe don't do it.  |
There's an implied "or" between the a) and b) options of my most worthy associate. The b) option is the last resort for those of us faced with impossible requirements.
Of course, increasing the number of instances unwisely could cost you your job. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jun 06, 2017 11:11 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
I neither meant that increasing the instances of your flow would cost you your job, nor implied an "or".
I said if you were faced with ridiculous/impossible requirements (that the application can't handle a timeout and can't do a retry) , then you should band-aid issues until you get a new job. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
|