MQSeries.net :: View topic - How a MQSeries Hub does its thing with persistent / non-pers

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » How a MQSeries Hub does its thing with persistent / non-pers

How a MQSeries Hub does its thing with persistent / non-pers

« View previous topic :: View next topic »

Author

Message

PeterPotkay

Posted: Thu May 29, 2003 7:45 am Post subject: How a MQSeries Hub does its thing with persistent / non-pers

Poobah

Joined: 15 May 2001
Posts: 7723

Imagine if you will a Hub and Spoke architecture.

QMHUB sits in the middle on a 2 CPU server. 1.3gig CPU, 2 gig RAM MQVersion
5.2.1 CSD05
QM1....QM20 are spokes.

On QMHUB, there are no application queues. Only queue manager aliases and
XMIT queues. There is a QMAlias for QM1 called QM1, and it directs messages
to QM1.XMITQ, which is serviced by the SNDR channel off to QM1. This QMAlias
/ XMITQ / SNDR setup is present for every spoke. And there is also a RCVR
channel as well from every spoke.

On day 1 there is only non persistent message traffic for all the spokes. On
day 2 a pair of the spokes starts exchanging persistent messages.

Q1. On day 1, is there any data being written to disk by QMHUB as the
messages fly thru? I assume no, since they are not persistent (but see Q3
below).

Q2. On day 2, even though we have 2 CPUs, we still have only 1 QM, so I
assume all the non persistent messages throughput must be affected by the
persistent messages. My reasoning is, as the persistent messages go in and
out of the QMAliases, and in and out of the XMIT queues, it has to "stop"
and log, right? And if it has to stop and log, then it can't be handling the
non persistent ones at the same time right? They have to wait?

Q3. I then defined a local queue on QMHUB and used one of the spoke QMs to
send non-persistent message to it. 1 GIG worth actually. Now these are not
written to disk, cause they are not persistent, so where are they, in
memory? I see the queue file grew by over a GIG, so doesn't that mean they
are on disk, even though they are non persistent?

More details..............
Batch Interval = 0
Batch Size = 50
Non Persistent Message Speed = Normal
There is an MQCluster involved, which is why we have the QMAliases (can't
cluster XMIT queues). I don't think this effects any of the answers to the
above, but I can expound if need be.
_________________
Peter Potkay
Keep Calm and MQ On

bower5932

Posted: Thu May 29, 2003 9:07 am Post subject:

Jedi Knight

Joined: 27 Aug 2001
Posts: 3023
Location: Dallas, TX, USA

Non-persistent messages can be written to disk. MQSeries keeps a queue buffer that it uses for non-persistent messages. However, if this buffer fills up, MQ has to do something with the message. So, it writes it to the queue file. Since it is non-persistent, it won't be written to the logs.

PeterPotkay

Posted: Thu May 29, 2003 2:15 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

How big is this queue buffer? Is it tuneable?
_________________
Peter Potkay
Keep Calm and MQ On

PeterPotkay

Posted: Fri May 30, 2003 6:29 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

So here is my real question, which is what makes me wonder exactly how a QM
handles messages.

Our HUB server is using Veritas. The disk that is being written to (whenever
that may be) is actually on the Storage Area Network (SAN).

The HUB is also clustered with 2 queue managers dedicated to MQSI. The HUB
acts simply as a gateway queue manager for this MQSI cluster. THE MQSI boxes
are in 2 separate locations, with Veritas, and both again also write to the
SAN.

Whenever we make bin changes to the SAN, that change ripples across the
fabric, making the SAN unavailable for a tiny bit of time.

Now, we have an app that is counting milliseconds in its roundtrip of the
message. This message starts on one of the spokes, comes to the HUB, is
round robined to one of the MQSI boxes, the processed message comes back to
the HUB, which then sends it down to the receiving spoke. It processes the
message, sends it back to the hub, round robined into MQSI for processing
the reply, the processed reply goes back to the hub, which then sends the
reply back to the originating spoke. For 99.99% of the time, this roundtrip
takes under 500 milliseconds. The app waits up to 2000 milliseconds for the
reply. The messages are non persistent and about 25K in size.

Whenever the bin changes to the SAN take place, we start getting messages
that take longer than 2000 milliseconds, and now we have orphaned replies.
These are non persistent messages that are under 64K, so why does a change
that makes the disk unavailable cause these messages to slow down? My guess
is that the persistent messages the HUB is processing at the same time (or
the >64K Nonpersistent ones) must somehow be effecting the performance of
the non persistent ones. And I also assume that channel speed has nothing to
do with this.

So the angle I am after here is how can I increase the performance of my
messages for this app so that changes to the SAN don't effect it.
_________________
Peter Potkay
Keep Calm and MQ On

jefflowrey

Posted: Fri May 30, 2003 8:58 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Have you effectively isolated the HUB as the location of the slowdown?

From what you describe, the slowdown of the SAN disk access could just as easily be affecting the performance at your WMQI brokers, instead of the performance at the HUB.

bower5932

Posted: Fri May 30, 2003 10:18 am Post subject:

Jedi Knight

Joined: 27 Aug 2001
Posts: 3023
Location: Dallas, TX, USA

Going two appends back, the queue buffer is definitely a tunable parameter. I believe that it is mentioned in some of the performance related SupportPacs.

PeterPotkay

Posted: Fri May 30, 2003 10:26 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

bower5932, I will check those support pacs. Thanks.

Jeff, you are absolutly correct. Any of those 3 boxes that use the SAN could be the cause. The question is how do I prove which one it is?
_________________
Peter Potkay
Keep Calm and MQ On

jefflowrey

Posted: Fri May 30, 2003 11:04 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Quote:

The question is how do I prove which one it is?

Appropriate monitoring. If you can't get the data you need from your normal monitoring system, and don't want to write a channel exit to record msg id's and timestamps, then you can do manual testing.

Stop the channels between each step (from your app QM to the hub, from the hub to your WMQI boxen, from your WMQI boxen to the hub, etc...), and measure how long each step takes.

PeterPotkay

Posted: Tue Jun 03, 2003 9:33 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

A channel speed of Normal is what is getting me here. Neil Casey and Brian McCarty provided the exact explanation as to why: non persistent messages sent over a normal speed channel cause persistent message to be written to the channel sync queue, which requires disk. I set up some tests in my lab environment. Here are the results.

*************************************************************
Server1: SpokeQM1, Win2000 SP2, MQ 5.3 CSD03
RemoteQ called FinalQ that points to FinalQ on SpokeQM2, via transmit queue
HubQM1

Server2: HubQM1, Win2000 SP2, MQ 5.2.1 CSD05
QMAlias called SpokeQM2, which sends messages to transmit queue
SpokeQM2.XMITQ

Server3: SpokeQM2, Win2000 SP2, MQ 5.2.1 CSD05
Local queue called FinalQ
************************************************************

Test #1: Channel SpokeQM1.HubQM1 has a speed of NORMAL. Start putting 1K
Non-Persistent(NP) messages every 250 milliseconds to the remote queue def
on SpokeQM1.
Results #1: As expected, constant disk writes on the server that houses
HubQM1.

Test #2: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 1K NP
messages every 250 milliseconds to the remote queue def on SpokeQM1.
Results #2: As expected, no disk activity at all on the server that houses
HubQM1. Actually, there was disk activity when the channel started/ended,
but for the whole duration while the channel was running, no I/O.

Test #3: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 70,000
byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1.
Results #3: No disk activity at all on the server that houses HubQM1. ???
These messages are larger than the 64K queue buffer, so why are the messages
flying thru the hub with no I/O? I am happy with these results, just that it
is unexpected. Could it be that the Sending MCA to SpokeQM2 has the XMIT
queue open ready for messages, with an outstanding GET? But I thought this
was a feature new to 5.3 only.

Test #4: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000
byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1.
Every 45 seconds or so, I send over a Persistent 5000 byte message on the
same channel.
Results #4: As expected, no disk activity at all on the server that houses
HubQM1, except every 45 seconds when the P message comes over.

Test #5: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000
byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1.
As the messages are flowing, yank the 2 cables that connect this server to
the SAN (Veritas was disabled so it would not try and fail over).
Results #5: No effect at all. Even though the server had no hard disk, these
messages still kept flying thru the server as if nothing at all was wrong.

Test #6: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000
byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1.
At the same time, start putting 5000 byte P messages over the same channel.
As the messages are flowing, yank the 2 cables that connect this server to
the SAN (Veritas was disabled so it would not try and fail over).
Results #6: Everything backs up. Both NP and P messages are backed up in the
XMITQ on SPOKEQM1. As soon as the cables are plugged back in, the messages
start flowing again.

Test #7: Channel SpokeQM1.HubQM1 has a speed of FAST. Start putting 5000
byte NP messages every 250 milliseconds to the remote queue def on SpokeQM1.
At the same time, start putting 5000 byte P messages over A DIFFERENT
CHANNEL between SpokeQM1 and HubQM1. As the messages are flowing, yank the 2
cables that connect this server to the SAN (Veritas was disabled so it would
not try and fail over).

Results #7: Everything backs up on the channel that was dealing with P
messages. The channel that had only NP messages was not effected at all. As
soon as the cables are plugged back in, the messages start flowing again on
the secondary channel. The primary channel that had NP messages never
blinked.

So now I am kinda stuck. Back in the production environment, what to do? I
can set the channel between SpokeQM1 and the HUB to fast, as it is a
dedicated channel for this application anyway. I'll just let them know of
the possibility (very remote) that the channel may lose their message. SAN
blips are a lot more frequent than MQ losing NP messages over a FAST
channel.

But what do I do with the CLUSRCVR channels? They are a shared resource for
the whole company. Do I let this one application dictate that these channels
get switched to FAST, at the risk of other apps having NP message lost.
Granted, we have a pretty reliable network here, but man, what a waste of
time trying to hunt for messages that get lost over the fast channel. What do most people out there have their Cluster channel speeds at?
_________________
Peter Potkay
Keep Calm and MQ On

jefflowrey

Posted: Wed Jun 04, 2003 6:14 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

If the risk of NP messages getting lost is unacceptable, exactly why are they using NP messages?

That is, they've already accepted the risk of losing messages. If they've forgotten that, this is an excellent time to remind them - so that they aren't surprised when it happens.

PeterPotkay

Posted: Wed Jun 04, 2003 6:29 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Here is my answer that I posted on the listserve to the same question:
************************

About the messages being non persistent / persistent and the channel speed:

Even though the messages are non persistent, I still care about them. I have
always been of the mind set that whether a message is persistent or not has
more to do with how difficult it is for the apps to reproduce the message if
it got lost. If it is a big deal, then make it persistent. It will survive
anything and eventually be processed. Messages that tend to sit in queues
for a long time are susceptible to QMs going down, and thus should be made
persistent if they need to survive.

The messages in this app are inquiry style. They are invalid 5 seconds after
the fact. Even if they were persistent and survived a QM restart, they would
still be invalid, so why incur the performance penalties of persistence?
Now, that's not to say we don't care if they get lost or not. I always shake
my head when I hear people say "I made it non persistent because I don't
care if it gets lost or not". If you don't care, why did you bother to send
it in the first place?!?!? What if MQ was losing 50% of the nonpersistent
messages? I couldn't tell the app "Hey just resend them, they are only
inquiry messages anyway!" Nor could I say, "Every message in this company is
going to be persistent. We don't want to bother with lost messages ever".
Its my job to config MQ to be as reliable as possible and as fast as possible.

An application that sends non persistent inquiry messages that will be
invalid in 5 seconds has a reasonable assumption that MQ will do everything
it can to deliver them. Just because they don't need to survive a QM restart
doesn't mean they are less important.

I feel the happy medium between "Make all message persistent" and "Don't
expect all your messages to always make it to the other side" is to set the
message channel speed to normal, as long as conditions warrant it. If you
got a BATCHINT of 100 and a BATCHSIZE of 200 and your XMIT queues regularly
back up, and the occasional non persistent message is being held back until
the batch commits, then no way, the speed should be fast, and live with the
fact that it may get lost.

But I bet that is not how many of anybody's channels run. I bet most of us
have XMIT queues that are normally empty, and the BATCHINT is still set to
the default of 0. In this case, setting the speed to normal will have very
little effect on overall performance, but will insure that no messages ever
get lost.

I wonder why IBM choose to have the default setting of the channel speed set
to fast? Seems to me it would be better to make the default normal. This
would perform just fine for most people and would help MQ's rep of never
losing messages. You have no idea what a pain it was discovering that MQ was
losing messages over a particular fast channel. Days of blaming the apps
with losing the messages, hunting in DLQs all over the place, XMIT queues,
application queues, etc. The real kick in the pants is that when a message
is lost like this, there is ZERO record of the fact. You are left scratching
you head. The man hours wasted on hunting for a message lost like this is
just not worth it. I'll gladly take a tiny performance hit in a tiny
percentage of the messages I send over an already very fast product.

Any people looking to pump up the performance of a channel above and beyond
this could then tweak the channel to fast, only after realizing messages
could get lost. Maybe when it was time to decide what value to use as a
default, the logic was "We have a choice of making our product faster out of
the box or making our message delivery more assured out of the box". And the
choice was to make it fast, in case customers are running performance
comparisons against other messaging systems like SONINMQ or MSMQ. Who knows,
this is only a guess.
_________________
Peter Potkay
Keep Calm and MQ On

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » How a MQSeries Hub does its thing with persistent / non-pers

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP