|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
MQ Server 8.0.0.1 command server timeout? |
« View previous topic :: View next topic » |
Author |
Message
|
LouML |
Posted: Thu Feb 05, 2015 6:23 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
I appreciate all of the responses. I will continue to investigate. I'm still hesitant to open a PMR just yet. I'm not yet convinced it's an IBM issue. Besides, the UK queue manager itself is running fine there. It's just us trying to monitor it remotely that has issues. _________________ Yeah, well, you know, that's just, like, your opinion, man. - The Dude |
|
Back to top |
|
 |
PaulClarke |
Posted: Thu Feb 05, 2015 6:31 am Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Not really fjb. If we are talking 120 seconds then the time differential is easy to compensate for. If it was timing out after 10 milliseconds then I would agree with you. _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Feb 05, 2015 7:00 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20757 Location: LI,NY
|
PaulClarke wrote: |
Not really fjb. If we are talking 120 seconds then the time differential is easy to compensate for. If it was timing out after 10 milliseconds then I would agree with you. |
I am not quite sure I follow you.
Say issuing server shows t1 while responding server shows t2. Now if you are looking at t2 on a telnet console you know it is flawed by t3 (t3 being a variable network latency). If the time between the responses is more than the wait time, you know you will have disconnected. But if the time between responses is just underneath the wait time, did the first response come right away or just before wait time was over?
Is there a way to run ping at the same time to measure network latency? does it fluctuate much? What is the packet loss ratio? Are there other things going on at the same time such as "relative bandwidth saturation"?
TCP/IP is very slow over such distances due to the way it treats frames and error packets, and as such is unable to use the full bandwidth.
May be instead of using mo72 / runmqsc to make a client connection to the UK it would be better to to use an intermediary qmgr in the States to attach to for that enquiry (multi-hop scenario)?
 _________________ MQ & Broker admin |
|
Back to top |
|
 |
PaulClarke |
Posted: Thu Feb 05, 2015 8:19 am Post subject: |
|
|
 Grand Master
Joined: 17 Nov 2005 Posts: 1002 Location: New Zealand
|
Are you really saying that you can not line up timestamps between remote remote servers to an accuracy of 2 minutes!?
Most machines, in my experience, use time servers so their clocks tend to be fairly well synchronised anyway.
As for network latency I think the worst I've ever seen was a connection from Tokyo to New York and that had a network delay of about 5 seconds. I have not seen any network connections with delays in the order of minutes. Of course they may well be out there.
Anyway, it was just a suggestion. I feel slightly responsible since I am the author of MO72. However, I seriously doubt whether this is MO72's fault so perhaps I should just have just kept quiet. This forum seems to really enjoy objecting either to peoples questions or to their suggestions.
Regards,
Paul. _________________ Paul Clarke
MQGem Software
www.mqgem.com |
|
Back to top |
|
 |
tczielke |
Posted: Thu Feb 05, 2015 8:34 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
mqjeff wrote: |
Secondly, I remember a recentish thread (in the last two months) about the command server leaving messages on the DLQ, and I believe there wasn APAR. Some poking around might find something. Although it's possible that was on the vienna listserv, too. |
We had an issue when we migrated to v8.0.0.1 with the command server putting a message to the DLQ (very similar to what LouML is experiencing) after the dmpmqcfg command had completed. IBM support said it was a very close match to APAR IT03921. I would think a PMR would be a good idea here, to make sure this isn't an issue with the command server. |
|
Back to top |
|
 |
LouML |
Posted: Thu Feb 05, 2015 9:55 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
tczielke wrote: |
mqjeff wrote: |
Secondly, I remember a recentish thread (in the last two months) about the command server leaving messages on the DLQ, and I believe there wasn APAR. Some poking around might find something. Although it's possible that was on the vienna listserv, too. |
We had an issue when we migrated to v8.0.0.1 with the command server putting a message to the DLQ (very similar to what LouML is experiencing) after the dmpmqcfg command had completed. IBM support said it was a very close match to APAR IT03921. I would think a PMR would be a good idea here, to make sure this isn't an issue with the command server. |
This is good to know. Did you have an issue where distance between servers was involved? We have no issue with the local servers. Just a problem with one server in the UK. _________________ Yeah, well, you know, that's just, like, your opinion, man. - The Dude |
|
Back to top |
|
 |
tczielke |
Posted: Thu Feb 05, 2015 10:09 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
No, this was a dmpmqcfg command running locally on the server, and we can recreate the issue consistently at 8.0.0.1. The issue does not happen at 7.5.0.2. Another person on the MQLISTSERV has reported a sporadic problem with the command server doing what you are observing (putting a message on the DLQ after the originating PCF application was told the PCF replies were complete and closed the reply queue) and IBM opened an APAR for that as well, although I believe her MQ version was 7.0.1 (but a recent fix pack, I think).
You can validate if the command server is at fault with tracing. |
|
Back to top |
|
 |
LouML |
Posted: Wed Feb 25, 2015 4:10 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
Just returned from a much needed vacation. Now, back to this issue... ugh...
Here’s the latest from before I went away.
An application group contacted us and said their app was getting slower and slower as the week progressed. The Unix admin came in and started looking around. He noticed that the amqrmppa process was consuming more and more CPU as the week went on. Once the server was rebooted over the weekend, things were back to normal. Now, CPU usage is creeping up again.
Opened a PMR with IBM and was given 8.0.0.1-WS-MQ-LinuxX64-LAIT03414.tar.gz. The following is a snippet from the readme.tx file:
Code: |
APAR IT03414, ABSTRACT / SUMMARY :
--------------------------------------------------------------------------------
MQ V 7.5 AMQRMPPA CPU 100% CONSUMPTION
APAR IT03414, USERS AFFECTED :
--------------------------------------------------------------------------------
Users of XA may be affected by this problem
APAR IT03414, PROBLEM SUMMARY :
--------------------------------------------------------------------------------
A function iterates over a linked list of structures, one of which is created
for each different rmid that we encounter. <br/> <br/>Every time that the TM
calls xa_open(), it is passing in a new rmid. This is potentially causing the
queue manager to have to iterate over a very long linked list of structures,
and it is this which is causing the high CPU usage.
APAR IT03414, PROBLEM CONCLUSION :
--------------------------------------------------------------------------------
The change will cause MQ to release any memory allocated on an xa_open(), when
xa_close() is called. Since the change prevents the internal linked list from
getting too big, the time spent iterating this list should be reduced. |
We installed this on our Dev server just before I went on vacation. about 1½ days later we were seeing CPU gradually increase again.
Looking at the amqrmppa process, this first output was from after the install.
Code: |
[mqm@ukdev ~]$ ps aux --sort=-pcpu | head -2
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mqm 8482 1.6 0.9 2905664 348536 ? Ssl Feb09 16:59 /opt/mqm/bin/amqrmppa -m QMUK
[mqm@ukdev ~]$ |
The second was the most recent.
Code: |
[mqm@ukdev ~]$ ps aux --sort=-pcpu | head -2
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mqm 8482 3.0 2.1 2905664 348536 ? Ssl Feb09 72:31 /opt/mqm/bin/amqrmppa -m QMUK
[mqm@ukdev ~]$ |
For reference, the Production values are:
Code: |
[mqm@ukprod ~]$ ps aux --sort=-pcpu | head -2
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
mqm 8523 14.3 5.5 3548720 908676 ? Ssl Feb08 728:56 /opt/mqm/bin/amqrmppa -m QMUK
[mqm@ukprod ~]$ |
We decided to stop/restart the Production queue manager each Wednesday evening to free up the CPU for the rest of the week.
After running the fix in Dev, we still had the issue so the interim fix did not help.
While I was out, IBM returned with:
Code: |
Below is an update from our Level 3 Support regarding the high CPU problem.
-------------------------------------------------------
Hello,
I suspect the high CPU could be because of channel status table scan. We
have few problems(high CPU and high shared memory usage) fixed in this
area in MQ 8002 recently. Unfortunately these defects are complex and can
not be provided as an interim fix. I suggest applying 8002 which will be
available soon. If the problem should reoccur, please provide a detailed
MQ trace so that we can check and confirm whether the problem is occurring
because of status table scan. |
I see that 8.0.0.2 is scheduled for release 1st quarter 2015.
Anyone have any further insight as to if it's today or March 31,2015? IBM won't say. _________________ Yeah, well, you know, that's just, like, your opinion, man. - The Dude |
|
Back to top |
|
 |
LouML |
Posted: Wed Mar 04, 2015 4:11 am Post subject: |
|
|
 Partisan
Joined: 10 Nov 2005 Posts: 305 Location: Jersey City, NJ / Bethpage, NY
|
Well, 8.0.0.2 is out and it seems to have fixed the amqrmppa issue, among others we've been experiencing.
We've been running it in Development since Monday and we are not seeing the amqrmppa issue (IT03414).
Additionally, we are not seeing the issue I first posted in the this thread where PCF messages are winding up on the Dead Letter Queue (IT01469).
Finally, an issue I did not post about but was happening to us, was the issue where a Listener configured with CONTROL(QMGR) failed to start (IT03098)
Code: |
APAR - Description
IT01469 - PCF response messages are written to the dead letter queue when using a PCF message agent
IT03098 - Listener fails to start with error AMQ9255 or FDC XY504000
IT03414 - MQ v 7.5 amqrmppa cpu 100% consumption
|
_________________ Yeah, well, you know, that's just, like, your opinion, man. - The Dude |
|
Back to top |
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|