|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Orphaned connections bringing MQ down |
« View previous topic :: View next topic » |
Author |
Message
|
HenriqueS |
Posted: Fri Sep 24, 2010 1:51 pm Post subject: Orphaned connections bringing MQ down |
|
|
 Master
Joined: 22 Sep 2006 Posts: 235
|
Hello,
Folks, I am here after almost 2 weeks of trials, research and still not sure of what is going on.
1) We have a WAS cluster (2 nodes) hosted on virtual machines and a MQ server hosted also on a virtual machine.
2) The WAS cluster got recently migrated to 7.0, the MQ stills at 6.0.2.10.
3) All MQ connections from one of the WAS nodes perform well, the connection pool is cleaned and all put/get operations are 100%.
4) BUT all the MQ connections from the other WAS node does not perform well, the connection pool is left with dozens of TCP connections until the max connection configured on MQ is reached. Also many get (using JMS) operatiions keep failing returnin null until, all of sudden, a batch of 50 messages is delivered, and soon after keeps returning noon for some period.
We did several changes on the JMS connection pool management on the WAS side but it not give any practical results.
One thing I noticed is that MQ has a lot of TCP/IP errors relating to the bad WAS node:
Quote: |
09/23/2010 03:34:56 AM - Process(17795.7298) User(root) Program(amqrmppa)
AMQ9209: Connection to host 'wasnode01-t (172.17.105.110)' closed.
EXPLANATION:
An error occurred receiving data from 'wasnode01-t (172.17.105.110)' over
TCP/IP. The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
|
So, everything points to some network-related issue, but I wanted to hear from any forum members if anyone saw something like this already.
I am getting to the point of:
1) Ask operating systems team to change the falulty WAS node to the same node where the good one is hosted.
2) Ask the networking team to do some deep network traffic analysis to discover why I have so many broken connections.
The networking team already told me about the ocurrence of some large frames flowing on the network (called jumbo frames), tipical seen in virtual lans (VLANs) but that may be reject at some point because of they size, causing issues on the TCP sequencing.
Any other guesses? |
|
Back to top |
|
 |
RogerLacroix |
Posted: Fri Sep 24, 2010 2:18 pm Post subject: Re: Orphaned connections bringing MQ down |
|
|
 Jedi Knight
Joined: 15 May 2001 Posts: 3264 Location: London, ON Canada
|
HenriqueS wrote: |
09/23/2010 03:34:56 AM - Process(17795.7298) User(root) Program(amqrmppa) |
I'll start with the one error that I can see. Why are you starting/running the queue manager with the 'root' UserID? You should be starting the queue manager with the 'mqm' UserID.
Regards,
Roger Lacroix
Capitalware Inc. _________________ Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter |
|
Back to top |
|
 |
HenriqueS |
Posted: Fri Sep 24, 2010 2:49 pm Post subject: Re: Orphaned connections bringing MQ down |
|
|
 Master
Joined: 22 Sep 2006 Posts: 235
|
I have no idea why this information is shown...maybe at sometime after a crack (max conn reached), I restarted the queue manager by hand...
Currently, there it is:
Code: |
[DEINF.SEGAN@sbcdf365]$ ps aux | grep amqrmppa
mqm 2848 0.1 0.5 112364 5196 ? Ssl 19:21 0:01 /opt/mqm/bin/amqrmppa -m QM.MQ_T_BC
60050 4610 0.0 0.0 61172 720 pts/0 R+ 19:46 0:00 grep amqrmppa
mqm 22994 0.0 0.4 108512 5108 ? Ssl Sep21 0:06 /opt/mqm/bin/amqrmppa -m QM.MQ_T_MON
mqm 23015 0.0 0.4 109800 4828 ? Ssl Sep21 0:08 /opt/mqm/bin/amqrmppa -m QM.MQ_T_IF
mqm 30220 0.0 0.8 138360 8692 ? Ssl 18:09 0:05 /opt/mqm/bin/amqrmppa -m QM.MQ_T_BC
[DEINF.SEGAN@sbcdf365]$
|
RogerLacroix wrote: |
HenriqueS wrote: |
09/23/2010 03:34:56 AM - Process(17795.7298) User(root) Program(amqrmppa) |
I'll start with the one error that I can see. Why are you starting/running the queue manager with the 'root' UserID? You should be starting the queue manager with the 'mqm' UserID.
Regards,
Roger Lacroix
Capitalware Inc. |
|
|
Back to top |
|
 |
HenriqueS |
Posted: Wed Sep 29, 2010 2:17 pm Post subject: |
|
|
 Master
Joined: 22 Sep 2006 Posts: 235
|
So, I am leaving this here for reference...problem solved.
Apparently the WebSphere team found it. There is a fault on WAS 7.0 (don´t know the exactly patch level) that just ignores external MQ libraries (from 6.0.X) or even newest ones from lastest patches.
They discovered this because thanks god we had a good WAS node where MQ and WAS were doing just fine. Then they checked the open process files by the WAS process (Linux 'lsof' command gives this). They compared to the other faulty servers and noticed that the referenced jars were not the same!
So they tricked the faulty WAS servers moving the bad MQ jars to somewhere else and creating a symbolic link to the good ones.
After that, we had no more orphaned, buggy JMS connections from WAS cummulating until MQ refusal.
They still researching why the WAS namespace got messed up and are in contact with IBM to check what really happened.
***Well, not only using the wrong lib but ignoring any customizations they made on the setupcmdline.sh script, including when MQ 6 libs were expressely pointed (it´s an command line were WAS admins usually push more libs into the namespace). ***
Faulty WAS servers were using this lib
Code: |
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.0
Created-By: 1.4.2 (IBM Corporation)
Copyright-Notice: Licensed Materials - Property of IBM 5724-H
72, 5655-L82, 5724-L26 (c) Copyright IBM Corp. 2008 All Righ
ts Reserved. US Government Users Restricted Rights -
Use,duplication or disclosure restricted by GSA ADP Schedule Contra
ct with IBM Corp.
Sealed: false
Specification-Title: J2EE Connector Architecture
Specification-Version: 1.5
Implementation-Title: WebSphere MQ Resource Adapter
Implementation-Version: 7.0.0.0-k700-L080820
Implementation-Vendor: IBM Corporation
|
The good WAS server was using this lib
Code: |
Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.0
Created-By: 1.4.2 (IBM Corporation)
Copyright-Notice: Licensed Materials - Property of IBM 5724-H72,
5655-R36, 5724-L26, 5655-L82 (c) Copyright IBM Corp. 2008 A
ll Rights Reserved. US Government Users Restricted Rights -
Use,duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Sealed: false
Specification-Title: J2EE Connector Architecture
Specification-Version: 1.5
Implementation-Title: WebSphere MQ Resource Adapter
Implementation-Version: 7.0.1.2-k701-102-100504
Implementation-Vendor: IBM Corporation
|
|
|
Back to top |
|
 |
mqjeff |
Posted: Thu Sep 30, 2010 1:28 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
That suggested you had the two servers at different patch levels of WAS, which is not really a good idea. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|