MQSeries.net :: View topic - Lowering effective wait time on a problematic cluster qm

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Lowering effective wait time on a problematic cluster qm

Goto page 1, 2 Next

Lowering effective wait time on a problematic cluster qm

« View previous topic :: View next topic »

Author

Message

mattfarney

Posted: Tue Jan 18, 2011 2:05 pm Post subject: Lowering effective wait time on a problematic cluster qm

Disciple

Joined: 17 Jan 2006
Posts: 167
Location: Ohio

I am trying to increase my understanding on balancing. The clustering manual does not go into much detail with how the workload balancing is performed.

I have three QMA, QMB, and QMC who are non-repository QMs in a cluster that I do not own. They are sent data from other machines in the cluster. Let's say the connection from someQM->QMB stops for some reason. My goal is to minimize the time messages are stuck on a remote server intended to be delivered to QMB. The communications issue are being researched, but I've been asked to try and help minimize the impacts.

As soon as the channel goes to RETRYING, I assume that the traffic targeted for QMB will be redistributed to QMA/QMC. I guess technically, this happens when that someQM detects that the channel is in RETRYING, since the communications issues could affect that determination too. Correct?

What settings contribute to this wait time? The short and long retry timers only matter after an channel has gone to RETRYING, correct?

So if the channel is stuck in an odd situation (network failure during a batch), I believe I should be looking at heartbeat and keepalive settings.
It is implied in the intercommunication book that these work the same for cluster channels. Anyone have any past experience with heartbeats in a clustered environment?

Am I leaving anything out?

-mf

mqjeff

Posted: Tue Jan 18, 2011 2:44 pm Post subject: Re: Lowering effective wait time on a problematic cluster qm

Grand Master

Joined: 25 Jun 2008
Posts: 17447

mattfarney wrote:

As soon as the channel goes to RETRYING, I assume that the traffic targeted for QMB will be redistributed to QMA/QMC.

That depends on how the messages were sent.

bruce2359

Posted: Tue Jan 18, 2011 2:56 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

That depends on how the queue was opened.

MQOO_BIND_NOT_FIXED vs. MQOO_BIND_FIXED. If BIND_FIXED, you have directed clustering software to only send messages to the queue name resolved at MQOPEN time, and not at MQPUT time.

BIND_NOT_FIXED allows messages in the SCTQ to be routed to the next available instance of the queue.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

mattfarney

Posted: Tue Jan 18, 2011 3:39 pm Post subject:

Disciple

Joined: 17 Jan 2006
Posts: 167
Location: Ohio

I knew I left some important information out.
The clustered queue is the same name on all three servers and is DEFBIND(NOTFIXED).

MQ6.0 - windows

-mf

exerk

Posted: Tue Jan 18, 2011 4:10 pm Post subject: Re: Lowering effective wait time on a problematic cluster qm

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

mattfarney wrote:

...I have three QMA, QMB, and QMC who are non-repository QMs in a cluster that I do not own...

If you do not own them, and you have done everything as stated by others, then there is nothing more you can do - it is up to the network owner and the owners of the 'other' queue managers to resolve the issue.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.

bruce2359

Posted: Tue Jan 18, 2011 4:20 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Quote:

The clustered queue is the same name on all three servers and is DEFBIND(NOTFIXED).

This is a queue attribute.

The app developer can specify one of these in MQOO (open options):
1. BIND_FIXED
2. BIND_NOT_FIXED
3. BIND_AS_Q_DEF

If the developer specifies BIND_FIXED or BIND_NOT_FIXED, the queue DEFBIND attribute has no effect on the open.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

bruce2359

Posted: Tue Jan 18, 2011 4:42 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

You will need to examine the application code to determine exactly which open options are being used.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

mattfarney

Posted: Tue Jan 18, 2011 5:00 pm Post subject:

Disciple

Joined: 17 Jan 2006
Posts: 167
Location: Ohio

If it were only that easy...
I've asked the question, but there's no guarantee I'll get an answer.

Personal Opinion: I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic.

The traffic I see across the three systems is well balanced, so I think we can safely assume that the BIND_FIXED option is not being used [though as I said above, I've asked them to check].

-mf

bruce2359

Posted: Tue Jan 18, 2011 5:18 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Quote:

The traffic I see across the three systems is well balanced, so I think we can safely assume that the BIND_FIXED option is not being used ...

Is a cluster workload exit being used?

Quote:

I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic.

Hmmmm. So, you believe it is the function of a system administrator to ensure quality of data?

Do you also believe it's also the function of a sysadmin to ensure that business processes (arithmetic, database updates, etc.) are correct in every application program? Do you check every line of code to make this happen?

I'd strongly suggest that you draw a line of separation between what a sysadmin can (should) do, and what is the responsibility of an application developer. Fixing things so bad application code behaves better does nothing to improve the application code. Firing app developers for low-quality code is a better choice.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

fjb_saper

Posted: Tue Jan 18, 2011 9:20 pm Post subject: Re: Lowering effective wait time on a problematic cluster qm

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

mattfarney wrote:

As soon as the channel goes to RETRYING, I assume that the traffic targeted for QMB will be redistributed to QMA/QMC. I guess technically, this happens when that someQM detects that the channel is in RETRYING, since the communications issues could affect that determination too. Correct?

Not quite right. Imagine that QMB is the reply to qmgr. There is nothing wrong with the channel from QMB to QMC, QMA but the channel from QMC to QMB is in retry mode... So your requests go out, get processed but do not come back until the problem is resolved, if they were processed by QMC, or transited through QMC...

mattfarney wrote:

What settings contribute to this wait time? The short and long retry timers only matter after an channel has gone to RETRYING, correct?

So if the channel is stuck in an odd situation (network failure during a batch), I believe I should be looking at heartbeat and keepalive settings.
It is implied in the intercommunication book that these work the same for cluster channels. Anyone have any past experience with heartbeats in a clustered environment?

Am I leaving anything out?

-mf

Yes you are leaving out one of the biggest offenders... i.e. a queue is full on the destination qmgr. The receiving MCA has a number of retries and retry interval for such a scenario (lookup the specifics in the mqsc manual) before putting the message to the DLQ. The tweeking of those parms can expedite significantly the MCA putting the messages on the DLQ.

If your communications are essentially sub second, a full destination queue will wreck havoc on your MQ cluster network. The easy fix is to increase the queue depth on the fly. Of course you need to scale or fix the consumer right after that....

Have fun

_________________
MQ & Broker admin

skoobee

Posted: Tue Jan 18, 2011 10:29 pm Post subject:

Acolyte

Joined: 26 Nov 2010
Posts: 52

Look at the BATCHHB attribute. This checks the network connection just before committing the msgs, so if there is a problem the batch can be backed out and the msgs resdistributed.

Vitor

Posted: Wed Jan 19, 2011 3:45 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mattfarney wrote:

Personal Opinion: I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic.

<plug>There is but it requires 3rd party software</plug>

This is no way an attempt to address the question:

Quote:

I'd strongly suggest that you draw a line of separation between what a sysadmin can (should) do, and what is the responsibility of an application developer

because I agree with both of you.

_________________
Honesty is the best policy.
Insanity is the best defence.

mattfarney

Posted: Wed Jan 19, 2011 9:51 am Post subject:

Disciple

Joined: 17 Jan 2006
Posts: 167
Location: Ohio

bruce2359 wrote:

Quote:

The traffic I see across the three systems is well balanced, so I think we can safely assume that the BIND_FIXED option is not being used ...

Is a cluster workload exit being used?

None is defined.

bruce2359 wrote:

Quote:

I wish there was a way to force certain MQ options (persistence, defbind, etc.) since trusting the generator of the data is problematic.

IMO, this paradigm works well when the content is being created by people under the control of the processing application. If I am receiving content from outside my organization/company/entity, relying on their programmers to set the appropriate flags and settings is a troublesome burden.

-mf

bruce2359

Posted: Wed Jan 19, 2011 10:00 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Quote:

...a troublesome burden.

Yes, but only if you choose to accept it.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

fjb_saper

Posted: Wed Jan 19, 2011 9:42 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

bruce2359 wrote:

Quote:

...a troublesome burden.

Yes, but only if you choose to accept it.

You need to make sure that the programs that do not have this set correctly do not pass your deliverables' acceptance criteria...

_________________
MQ & Broker admin

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » Clustering » Lowering effective wait time on a problematic cluster qm

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP