Author |
Message
|
mattfarney |
Posted: Tue Jan 18, 2011 2:05 pm Post subject: Lowering effective wait time on a problematic cluster qm |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
I am trying to increase my understanding on balancing. The clustering manual does not go into much detail with how the workload balancing is performed.
I have three QMA, QMB, and QMC who are non-repository QMs in a cluster that I do not own. They are sent data from other machines in the cluster. Let's say the connection from someQM->QMB stops for some reason. My goal is to minimize the time messages are stuck on a remote server intended to be delivered to QMB. The communications issue are being researched, but I've been asked to try and help minimize the impacts.
As soon as the channel goes to RETRYING, I assume that the traffic targeted for QMB will be redistributed to QMA/QMC. I guess technically, this happens when that someQM detects that the channel is in RETRYING, since the communications issues could affect that determination too. Correct?
What settings contribute to this wait time? The short and long retry timers only matter after an channel has gone to RETRYING, correct?
So if the channel is stuck in an odd situation (network failure during a batch), I believe I should be looking at heartbeat and keepalive settings.
It is implied in the intercommunication book that these work the same for cluster channels. Anyone have any past experience with heartbeats in a clustered environment?
Am I leaving anything out?
-mf |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jan 18, 2011 2:44 pm Post subject: Re: Lowering effective wait time on a problematic cluster qm |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
mattfarney wrote: |
As soon as the channel goes to RETRYING, I assume that the traffic targeted for QMB will be redistributed to QMA/QMC. |
That depends on how the messages were sent. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Jan 18, 2011 2:56 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
That depends on how the queue was opened.
MQOO_BIND_NOT_FIXED vs. MQOO_BIND_FIXED. If BIND_FIXED, you have directed clustering software to only send messages to the queue name resolved at MQOPEN time, and not at MQPUT time.
BIND_NOT_FIXED allows messages in the SCTQ to be routed to the next available instance of the queue. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
mattfarney |
Posted: Tue Jan 18, 2011 3:39 pm Post subject: |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
I knew I left some important information out.
The clustered queue is the same name on all three servers and is DEFBIND(NOTFIXED).
MQ6.0 - windows
-mf |
|
Back to top |
|
 |
exerk |
Posted: Tue Jan 18, 2011 4:10 pm Post subject: Re: Lowering effective wait time on a problematic cluster qm |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
mattfarney wrote: |
...I have three QMA, QMB, and QMC who are non-repository QMs in a cluster that I do not own... |
If you do not own them, and you have done everything as stated by others, then there is nothing more you can do - it is up to the network owner and the owners of the 'other' queue managers to resolve the issue. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Jan 18, 2011 4:20 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
The clustered queue is the same name on all three servers and is DEFBIND(NOTFIXED). |
This is a queue attribute.
The app developer can specify one of these in MQOO (open options):
1. BIND_FIXED
2. BIND_NOT_FIXED
3. BIND_AS_Q_DEF
If the developer specifies BIND_FIXED or BIND_NOT_FIXED, the queue DEFBIND attribute has no effect on the open. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Jan 18, 2011 4:42 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
You will need to examine the application code to determine exactly which open options are being used. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
mattfarney |
Posted: Tue Jan 18, 2011 5:00 pm Post subject: |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
If it were only that easy...
I've asked the question, but there's no guarantee I'll get an answer.
Personal Opinion: I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic.
The traffic I see across the three systems is well balanced, so I think we can safely assume that the BIND_FIXED option is not being used [though as I said above, I've asked them to check].
-mf |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Jan 18, 2011 5:18 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
The traffic I see across the three systems is well balanced, so I think we can safely assume that the BIND_FIXED option is not being used ... |
Is a cluster workload exit being used?
Quote: |
I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic. |
Hmmmm. So, you believe it is the function of a system administrator to ensure quality of data?
Do you also believe it's also the function of a sysadmin to ensure that business processes (arithmetic, database updates, etc.) are correct in every application program? Do you check every line of code to make this happen?
I'd strongly suggest that you draw a line of separation between what a sysadmin can (should) do, and what is the responsibility of an application developer. Fixing things so bad application code behaves better does nothing to improve the application code. Firing app developers for low-quality code is a better choice. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Jan 18, 2011 9:20 pm Post subject: Re: Lowering effective wait time on a problematic cluster qm |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
mattfarney wrote: |
As soon as the channel goes to RETRYING, I assume that the traffic targeted for QMB will be redistributed to QMA/QMC. I guess technically, this happens when that someQM detects that the channel is in RETRYING, since the communications issues could affect that determination too. Correct? |
Not quite right. Imagine that QMB is the reply to qmgr. There is nothing wrong with the channel from QMB to QMC, QMA but the channel from QMC to QMB is in retry mode... So your requests go out, get processed but do not come back until the problem is resolved, if they were processed by QMC, or transited through QMC...
mattfarney wrote: |
What settings contribute to this wait time? The short and long retry timers only matter after an channel has gone to RETRYING, correct?
So if the channel is stuck in an odd situation (network failure during a batch), I believe I should be looking at heartbeat and keepalive settings.
It is implied in the intercommunication book that these work the same for cluster channels. Anyone have any past experience with heartbeats in a clustered environment?
Am I leaving anything out?
-mf |
Yes you are leaving out one of the biggest offenders... i.e. a queue is full on the destination qmgr. The receiving MCA has a number of retries and retry interval for such a scenario (lookup the specifics in the mqsc manual) before putting the message to the DLQ. The tweeking of those parms can expedite significantly the MCA putting the messages on the DLQ.
If your communications are essentially sub second, a full destination queue will wreck havoc on your MQ cluster network. The easy fix is to increase the queue depth on the fly. Of course you need to scale or fix the consumer right after that....
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
skoobee |
Posted: Tue Jan 18, 2011 10:29 pm Post subject: |
|
|
Acolyte
Joined: 26 Nov 2010 Posts: 52
|
Look at the BATCHHB attribute. This checks the network connection just before committing the msgs, so if there is a problem the batch can be backed out and the msgs resdistributed. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jan 19, 2011 3:45 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mattfarney wrote: |
Personal Opinion: I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic.
|
<plug>There is but it requires 3rd party software</plug>
This is no way an attempt to address the question:
Quote: |
I'd strongly suggest that you draw a line of separation between what a sysadmin can (should) do, and what is the responsibility of an application developer |
because I agree with both of you.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mattfarney |
Posted: Wed Jan 19, 2011 9:51 am Post subject: |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
bruce2359 wrote: |
Quote: |
The traffic I see across the three systems is well balanced, so I think we can safely assume that the BIND_FIXED option is not being used ... |
Is a cluster workload exit being used? |
None is defined.
bruce2359 wrote: |
Quote: |
I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic. |
Hmmmm. So, you believe it is the function of a system administrator to ensure quality of data?
Do you also believe it's also the function of a sysadmin to ensure that business processes (arithmetic, database updates, etc.) are correct in every application program? Do you check every line of code to make this happen?
I'd strongly suggest that you draw a line of separation between what a sysadmin can (should) do, and what is the responsibility of an application developer. Fixing things so bad application code behaves better does nothing to improve the application code. Firing app developers for low-quality code is a better choice. |
IMO, this paradigm works well when the content is being created by people under the control of the processing application. If I am receiving content from outside my organization/company/entity, relying on their programmers to set the appropriate flags and settings is a troublesome burden.
-mf |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Jan 19, 2011 10:00 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
...a troublesome burden. |
Yes, but only if you choose to accept it. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jan 19, 2011 9:42 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
bruce2359 wrote: |
Quote: |
...a troublesome burden. |
Yes, but only if you choose to accept it. |
You need to make sure that the programs that do not have this set correctly do not pass your deliverables' acceptance criteria...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|