MQSeries.net :: View topic - Restart of auto defined cluster senders during network issue

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Restart of auto defined cluster senders during network issue

Restart of auto defined cluster senders during network issue

« View previous topic :: View next topic »

Author

Message

vsathyan

Posted: Sun Dec 06, 2015 10:59 pm Post subject: Restart of auto defined cluster senders during network issue

Centurion

Joined: 10 Mar 2014
Posts: 121

Team,

Last week we had a production issue, where messages were piled up in the sending side cluster transmit queue. We are using MQ 7.5 on Linux, with multiple transmit queues enabled (DEFCLXQ(CHANNEL)).

As part of trouble shooting, we stopped the auto defined cluster sender channel and started it back again.

I'm confident that this was not required as MQ channels will automatically make a reconnection attempt when a socket is lost or released during a network issue (say, a router malfunctioned, or the traffic is re-routed through a different network).

How ever, the network teams claim that the issue was resolved after restarting the auto defined cluster sender channel at MQ end.

The main question : Is there a need to restart auto defined cluster sender channel(s) after a network outage?

I also told them there are 100s of such channels in MQ and we never restarted any of them. Only one channel was restarted as part of trouble shooting activity, and they put the blame everything on MQ.

Can anyone shed some light on this?

Thanks in advance
vsathyan
_________________
Custom WebSphere MQ Tools Development C# & Java
WebSphere MQ Solution Architect Since 2011
WebSphere MQ Admin Since 2004

bruce2359

Posted: Mon Dec 07, 2015 5:15 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9396
Location: US: west coast, almost. Otherwise, enroute.

A bit more information will help.

What errors did you see in the AMQERR01.LOG file for this qmgr and this channel?

What was the channel state for this channel? RETRYING, STOPPED, something else?

Was any change made to the CLUSRCVR channel definition? (A CLUSRCVR definition is used as a template for creating a CLUSSDRA channel.) Display the CLUSRCVR channel definition. Look at the CLUSRCVR create/alter date/time.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

vsathyan

Posted: Mon Dec 07, 2015 7:54 am Post subject:

Centurion

Joined: 10 Mar 2014
Posts: 121

Hi Bruce,

At the time of the issue, the channel status was RUNNING, with substate MQGET. Hence I do not suspect any issue in MQ.

But messages being transferred over was around 2 messages a second (message size was not huge though, it was in KBs).

But after the manual restart of the auto defined cluster sender channel, immediately we did not see much improvement. It was flowing at around 4 messages a second. But the transmit queue backlog of around 4900 messages got processed in 30 minutes.

At the time of the message, the oldest message age was some where around 4500 seconds (75 minutes), which indicated that messages started piling aroun 1 hour 15 minutes ago.

No changes to cluster receiver channel definition at the receiving side. Even now it is untouched.

If you need any further information, please let me know. Thanks.

-vsathyan
_________________
Custom WebSphere MQ Tools Development C# & Java
WebSphere MQ Solution Architect Since 2011
WebSphere MQ Admin Since 2004

bruce2359

Posted: Mon Dec 07, 2015 8:11 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9396
Location: US: west coast, almost. Otherwise, enroute.

vsathyan wrote:

If you need any further information, please let me know. Thanks.

-vsathyan

I asked that you examine the AMQERR01.LOG file for this qmgr to see if any errors were reported. Check also the AMQERR01.LOG file for the MQ installation. TCP errors may be the cause of delays, and may be reported in either/both AMQERR01.LOG files.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

vsathyan

Posted: Mon Dec 07, 2015 10:27 am Post subject:

Centurion

Joined: 10 Mar 2014
Posts: 121

Hi Bruce,

Checked both the error logs.
Nothing logged in /var/mqm/errors/AMQERR01.LOG and nothing unusual logged in /var/mqm/qmgrs/<qmgrname>/errors/AMQERR01.LOG on the day issue occured, other than channel stopping and starting due to disconnect interval expired.

Keeping these aside, am still not convinced whether a manual restart of an auto defined cluster sender would it help in reducing the message backlog in cluster xmitq.

1. Let us assume auto clus sdr channels need a restart after a network issue, then why did this not occur on other channels in the network.
2. Messages had piled up in cluster xmitq in only one cluster queue manager which was sending data to the destination.
Cluster transmit queues in other cluster queue managers to the same destination had no backlog. (this clears no problem with clusrcvr channel definition at the receiving end).

Did a check on cluster receiver altdate at the receiving cluster qmgr, which shows 06/28/2015.
_________________
Custom WebSphere MQ Tools Development C# & Java
WebSphere MQ Solution Architect Since 2011
WebSphere MQ Admin Since 2004

bruce2359

Posted: Mon Dec 07, 2015 10:58 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9396
Location: US: west coast, almost. Otherwise, enroute.

OK.

So, the slowdown was a one-time event? Never happened before, and hasn't happened since?

Are you running MQ 7.5.0.0?

What MQ at the receiving end of the channel? What hardware platform? Are both ends equally/similarly provisioned? (One of my clients had a z/OS mainframe sending end overrunning the Windows receiving end.)

At the receiving end, were they running a big download or something else (watching videos) that clogged the channel?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

vsathyan

Posted: Mon Dec 07, 2015 11:23 am Post subject:

Centurion

Joined: 10 Mar 2014
Posts: 121

1. It happened somewhere in August 2015, but on different queue managers altogether (source and destination). We did not do a restart of channels then.
PMR discussions pointed to network storage, but storage team rejected it, stating <5ms IO operations.

2. WMQ 7.5.0.5 on OEL 6.5.

3. Both ends have same configuration - OEL6.5 on Intel X64 @ 2.7GHz, 16GB

4. No videos

The servers are dedicated for MQ and applications connect in client mode. None of the apps have server bindings.

5. Server support reported 48%CPU with 54% memory usage, 2GB in cache, swap not used at the receiving end.

One point to note - this is across regions - From USA to APJ. However, as i mentioned earlier, other cluster senders from the same data center in USA to the same receiving queue manager in APJ were running fine, without any backlogs in xmitqs.
_________________
Custom WebSphere MQ Tools Development C# & Java
WebSphere MQ Solution Architect Since 2011
WebSphere MQ Admin Since 2004

fjb_saper

Posted: Mon Dec 07, 2015 11:59 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

vsathyan wrote:

5. Server support reported 48%CPU with 54% memory usage, 2GB in cache, swap not used at the receiving end.

One point to note - this is across regions - From USA to APJ. However, as i mentioned earlier, other cluster senders from the same data center in USA to the same receiving queue manager in APJ were running fine, without any backlogs in xmitqs.

Don't know if point 5 is relevant. What you probably should have started with is the fact that the transmission is from USA to APJ.

If you are using a WAN, and I guess you are, the problem is not much so about bandwidth but becomes one about throughput.
So what affects your throughput?
Network quality ( reliability, number of packets dropped, number of frames dropped...). TCP is just not a protocol to saturate bandwidth on a poor connection. And your connection is poor (30 mins for 4,000) msgs? They should be gone in less than 5 seconds!!!. Especially if they're only in the KB range.

Manually stopping and restarting the cluster sender channel can have a positive effect on the communications. By force closing and then reopening the comms, you may luck out and get a different (better) routing through the WAN network.

Are you sure that you did not have a full queue at the receiver end of the channel? If you have messages for a full queue at the receiver end of the channel, the message goes through the configured retries before it gets disposed of according to rules (goes to DLQ?) This type of event would considerably slow down the channel speed and create a backlog on the xmitq.

Other channels/ channels from other qmgrs would not be slowed down unless they too carry messages for the full queue.
Have fun

_________________
MQ & Broker admin

bruce2359

Posted: Mon Dec 07, 2015 1:42 pm Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9396
Location: US: west coast, almost. Otherwise, enroute.

Persistent messages or non-persistent messages?

What is MAXDEPTH at the destination queue at the receiver end of the channel?

Are enough consumers running on the receiver end to keep the destination queue from filling up?

Are any messages ending up in the receiver end dead-letter-queue?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

vsathyan

Posted: Tue Dec 08, 2015 1:18 am Post subject:

Centurion

Joined: 10 Mar 2014
Posts: 121

1. Persistent
2. The queue was empty, the readers were processing the messages as soon as it reached the destination queues.
3. No messages landing up in dead letter queue.

I would say, these are the basic checks that every MQ admin would do.

We did not find any problems in the end point message processing.
Hence, the war with voice weapons is now between MQ and network

_________________
Custom WebSphere MQ Tools Development C# & Java
WebSphere MQ Solution Architect Since 2011
WebSphere MQ Admin Since 2004

fjb_saper

Posted: Tue Dec 08, 2015 5:44 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

vsathyan wrote:

Remember that no messages need to have hit the DLQ in either qmgr.
All you need to slow down your channel is messages going from a to b into a queue that is full on b with a slow consumption. The consumption is sufficient for the retry to put the message onto the queue and not the DLQ. The result is that all messages, even the ones for a different queue on b are slowed down to the consumption rate of the nearly full queue on b. Hence a backup on the xmitq in a.
Should this not have been the case, then truly you need to look at network problems. Remember to include into this check any potentially full system queues for central gathering of data (like event queues etc...)

Also check that the receiver channels are using the USEDLQ(yes) and that a DLQ has been defined...

Have fun

_________________
MQ & Broker admin

bruce2359

Posted: Tue Dec 08, 2015 5:52 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9396
Location: US: west coast, almost. Otherwise, enroute.

vsathyan wrote:

I would say, these are the basic checks that every MQ admin would do.

Yes.

No offense intended. I (we) cannot know what steps you've taken to identify the source of the problem. I (we) also cannot know the depth and breadth of your MQ problem-determination experience.

When doing PD, it is best to make no assumptions at all - to start with "the basics."
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

vsathyan

Posted: Tue Dec 08, 2015 7:08 pm Post subject:

Centurion

Joined: 10 Mar 2014
Posts: 121

Hey Bruce,

I never took your inputs in a wrong way.

Agree we have to check everything from source to target to identify and fix the problem.

Thanks to you and fjb, for your responses and time.

Regards,
vsathyan
_________________
Custom WebSphere MQ Tools Development C# & Java
WebSphere MQ Solution Architect Since 2011
WebSphere MQ Admin Since 2004

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Clustering » Restart of auto defined cluster senders during network issue

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP