Author |
Message
|
mattfarney |
Posted: Wed Feb 10, 2010 5:23 pm Post subject: Slow transmission time for large cluster messages |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
I have an existing cluster with a mixture of windows and unix machines.
Mixed into normal traffic, I am trying to send a large message (~725k). This has worked fine with no problems for many months.
Now, I'm getting slow delivery times for the large message (on the order of minutes) rather than the expected 3-4 seconds.
I have two questions:
1. For a cluster transmission, which objects need to have their maxmsgl appropriately large? (both QMs, destination Queue, both SCTQ?, anything else?)
2. Can anyone think of any settings that could cause the delay? Pings between the boxes look reasonably normal. |
|
Back to top |
|
 |
mvic |
Posted: Wed Feb 10, 2010 5:36 pm Post subject: Re: Slow transmission time for large cluster messages |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
mattfarney wrote: |
Can anyone think of any settings that could cause the delay? Pings between the boxes look reasonably normal. |
Relevant factors include reliability and speed of your network, and the backlog / volume of messages being transported across this and other cluster channels outbound from this queue manager. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Feb 10, 2010 5:46 pm Post subject: Re: Slow transmission time for large cluster messages |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mattfarney wrote: |
1. For a cluster transmission, which objects need to have their maxmsgl appropriately large? (both QMs, destination Queue, both SCTQ?, anything else?) |
Cluster channels, but 725 Kb isn't that large & is well below the default size of all objects (4 Mb)
mattfarney wrote: |
2. Can anyone think of any settings that could cause the delay? Pings between the boxes look reasonably normal. |
What's the depth on the SCTQ? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Feb 10, 2010 5:48 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
are the messages persistent? |
|
Back to top |
|
 |
mattfarney |
Posted: Wed Feb 10, 2010 6:06 pm Post subject: |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
Usually the only items in the SCTQ are the test messages, so 10-20ish.
Persistent yes.
I moved my helper program to the machine in question. It worked noticeably faster, so the network is obviously my first target.
Does MQ have any stat tracking type ability? I.e. tracking the number of lost packets, resent frames, etc.
-mf |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Feb 10, 2010 6:08 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Are not-so-large messages experiencing the same slowdown?
Is this slowdown limited to this one outbound cluster channel? Or is affecting all?
What about the receiving end of the channel? Is that box overloaded (high cpu, slow disk)? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
mvic |
Posted: Thu Feb 11, 2010 12:58 am Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
mattfarney wrote: |
Usually the only items in the SCTQ are the test messages, so 10-20ish. |
How are you determining this? If there is constantly a measured depth of 10-20 this might indicate a slightly unusual state where the channel is struggling to move those 10-20 messages.
It's possible you just calculated this number but didn't go to runmqsc / MQ Explorer etc. to measure it. You'd have to use one of those methods to answer Vitor's question reliably.
Quote: |
I moved my helper program to the machine in question. It worked noticeably faster, so the network is obviously my first target. |
Good work.
Quote: |
Does MQ have any stat tracking type ability? I.e. tracking the number of lost packets, resent frames, etc. |
The queue manager error logs are the place to look: /var/mqm/qmgrs/QMNAME/errors/AMQ*.LOG Take a look through this file on both queue managers, and see if any problems are reported for your channel and/or IP addresses. |
|
Back to top |
|
 |
mattfarney |
Posted: Thu Feb 11, 2010 11:36 am Post subject: |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
mvic wrote: |
If there is constantly a measured depth of 10-20 this might indicate a slightly unusual state where the channel is struggling to move those 10-20 messages.
|
This is a testing machine, so I'm able to completely control the depth.
I think I am in that unusual state. What I end up seeing is that the large messages disappear from the SCTQ and try to go across. I can see the chstatus show bytes transferring, but far slower than normal. A 750k message should not take 3 minutes to transfer. Neither box is obvious overloaded.
As for log entries:
Code: |
-------------------------------------------------------------------------------
02/10/10 20:48:52 - Process(28720.9) User(mfarney) Program(amqrmppa_nd)
AMQ9209: Connection to host 'mqdev (ip_redacted)' closed.
EXPLANATION:
An error occurred receiving data from 'mqdev (ip_redacted)' over TCP/IP.
The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3182 ------------------------------------------------------- |
This obviously implicates network issues. I think the answer is no, but is there any way to get a better tcp/ip error reason?
-mf |
|
Back to top |
|
 |
mvic |
Posted: Fri Feb 12, 2010 2:11 am Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
mattfarney wrote: |
As for log entries:
Code: |
-------------------------------------------------------------------------------
02/10/10 20:48:52 - Process(28720.9) User(mfarney) Program(amqrmppa_nd)
AMQ9209: Connection to host 'mqdev (ip_redacted)' closed.
EXPLANATION:
An error occurred receiving data from 'mqdev (ip_redacted)' over TCP/IP.
The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3182 ------------------------------------------------------- |
This obviously implicates network issues. I think the answer is no, but is there any way to get a better tcp/ip error reason?
-mf |
This indicates the remote end of the connection closed the connection. MQ channels don't behave that way normally.
Are you sure that this is a time and the IP address that is related to your problem?
In TCP/IP terms, I think it indicates the local side called recv() and got 0 bytes. This is typically a sign that the other end closed the socket.
It is more likely to see this type of thing for a client channel (which users can end with Ctrl+C quite often) rather than a message channel so this is a reason for me to question whether it is the right time and IP address.
Are there any other errors in the relevant time period? |
|
Back to top |
|
 |
mattfarney |
Posted: Tue Feb 16, 2010 9:42 am Post subject: |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
Nope. While I was asking questions here, I did have the network folks look at it late last week. Turns out the NIC was self-negotiating its transfer rate (and doing so to a much slower setting); so it appears, it was throttling the traffic itself.
Thanks for the help. Every time I ask a question here, I learn something.
-mf |
|
Back to top |
|
 |
gbaddeley |
Posted: Tue Feb 16, 2010 2:06 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
mattfarney wrote: |
Nope. While I was asking questions here, I did have the network folks look at it late last week. Turns out the NIC was self-negotiating its transfer rate (and doing so to a much slower setting); so it appears, it was throttling the traffic itself.
Thanks for the help. Every time I ask a question here, I learn something.
-mf |
Yay. MQ performance issues are usually due to external factors such as system resources, network, applications. In fact, MQ performance is usually very good out-of-the-box without any MQ tuning whatsoever. _________________ Glenn |
|
Back to top |
|
 |
mvic |
Posted: Tue Feb 16, 2010 4:32 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
mattfarney wrote: |
Turns out the NIC was self-negotiating its transfer rate (and doing so to a much slower setting); so it appears, it was throttling the traffic itself. |
This would explain slow data transfer.
It would not explain the error you posted on Feb 11. That was then a separate problem.
Are you still getting error messages like that? Are they correlated with any application problems? |
|
Back to top |
|
 |
mattfarney |
Posted: Mon Feb 22, 2010 4:22 pm Post subject: |
|
|
 Disciple
Joined: 17 Jan 2006 Posts: 167 Location: Ohio
|
Mvic, I think you were right. That error was probably from either killing MQExplorer or the helper program we use not closing correctly. I haven't seen it since.
-mf |
|
Back to top |
|
 |
|