|  | 
 
  
    | RSS Feed - WebSphere MQ Support | RSS Feed - Message Broker Support |  
 
  
	|    |  |  
  
	| Orphaned connections bringing MQ down | « View previous topic :: View next topic » |  
  	| 
		
		
		  | Author | Message |  
		  | HenriqueS | 
			  
				|  Posted: Fri Sep 24, 2010 1:51 pm    Post subject: Orphaned connections bringing MQ down |   |  |  
		  |  Master
 
 
 Joined: 22 Sep 2006Posts: 235
 
 
 | 
			  
				| Hello, 
 Folks, I am here after almost 2 weeks of trials, research and still not sure of what is going on.
 
 1) We have a WAS cluster (2 nodes) hosted on virtual machines and a MQ server hosted also on a virtual machine.
 
 2) The WAS cluster got recently migrated to 7.0, the MQ stills at 6.0.2.10.
 
 3) All MQ connections from one of the WAS nodes perform well, the connection pool is cleaned and all put/get operations are 100%.
 
 4) BUT all the MQ connections from the other WAS node does not perform well, the connection pool is left with dozens of TCP connections until the max connection configured on MQ is reached. Also many get (using JMS) operatiions keep failing returnin null until, all of sudden, a batch of 50 messages is delivered, and soon after keeps returning noon for some period.
 
 We did several changes on the JMS connection pool management on the WAS side but it not give any practical results.
 
 One thing I noticed is that MQ has a lot of TCP/IP errors relating to the bad WAS node:
 
 
   
	| Quote: |  
	| 09/23/2010 03:34:56 AM - Process(17795.7298) User(root) Program(amqrmppa)
 AMQ9209: Connection to host 'wasnode01-t (172.17.105.110)' closed.
 
 EXPLANATION:
 An error occurred receiving data from 'wasnode01-t (172.17.105.110)' over
 TCP/IP.  The connection to the remote host has unexpectedly terminated.
 ACTION:
 Tell the systems administrator.
 
 |  
 So, everything points to some network-related issue, but I wanted to hear from any forum members if anyone saw something like this already.
 
 I am getting to the point of:
 1) Ask operating systems team to change the falulty WAS node to the same node where the good one is hosted.
 2) Ask the networking team to do some deep network traffic analysis to discover why I have so many broken connections.
 
 The networking team already told me about the ocurrence of some large frames flowing on the network (called jumbo frames), tipical seen in  virtual lans (VLANs) but that may be reject at some point because of they size, causing issues on the TCP sequencing.
 
 Any other guesses?
 |  |  
		  | Back to top |  |  
		  |  |  
		  | RogerLacroix | 
			  
				|  Posted: Fri Sep 24, 2010 2:18 pm    Post subject: Re: Orphaned connections bringing MQ down |   |  |  
		  |  Jedi Knight
 
 
 Joined: 15 May 2001Posts: 3265
 Location: London, ON  Canada
 
 | 
			  
				| 
   
	| HenriqueS wrote: |  
	| 09/23/2010 03:34:56 AM - Process(17795.7298) User(root) Program(amqrmppa) |  I'll start with the one error that I can see.  Why are you starting/running the queue manager with the 'root' UserID?  You should be starting the queue manager with the 'mqm' UserID.
 
 Regards,
 Roger Lacroix
 Capitalware Inc.
 _________________
 Capitalware: Transforming tomorrow into today.
 Connected to MQ!
 Twitter
 |  |  
		  | Back to top |  |  
		  |  |  
		  | HenriqueS | 
			  
				|  Posted: Fri Sep 24, 2010 2:49 pm    Post subject: Re: Orphaned connections bringing MQ down |   |  |  
		  |  Master
 
 
 Joined: 22 Sep 2006Posts: 235
 
 
 | 
			  
				| I have no idea why this information is shown...maybe at sometime after a crack (max conn reached), I restarted the queue manager by hand... 
 Currently, there it is:
 
 
 
   
	| Code: |  
	| [DEINF.SEGAN@sbcdf365]$ ps aux | grep amqrmppa
 mqm       2848  0.1  0.5 112364  5196 ?        Ssl  19:21   0:01 /opt/mqm/bin/amqrmppa -m QM.MQ_T_BC
 60050     4610  0.0  0.0  61172   720 pts/0    R+   19:46   0:00 grep amqrmppa
 mqm      22994  0.0  0.4 108512  5108 ?        Ssl  Sep21   0:06 /opt/mqm/bin/amqrmppa -m QM.MQ_T_MON
 mqm      23015  0.0  0.4 109800  4828 ?        Ssl  Sep21   0:08 /opt/mqm/bin/amqrmppa -m QM.MQ_T_IF
 mqm      30220  0.0  0.8 138360  8692 ?        Ssl  18:09   0:05 /opt/mqm/bin/amqrmppa -m QM.MQ_T_BC
 [DEINF.SEGAN@sbcdf365]$
 
 |  
 
 
   
	| RogerLacroix wrote: |  
	| 
   
	| HenriqueS wrote: |  
	| 09/23/2010 03:34:56 AM - Process(17795.7298) User(root) Program(amqrmppa) |  I'll start with the one error that I can see.  Why are you starting/running the queue manager with the 'root' UserID?  You should be starting the queue manager with the 'mqm' UserID.
 
 Regards,
 Roger Lacroix
 Capitalware Inc.
 |  |  |  
		  | Back to top |  |  
		  |  |  
		  | HenriqueS | 
			  
				|  Posted: Wed Sep 29, 2010 2:17 pm    Post subject: |   |  |  
		  |  Master
 
 
 Joined: 22 Sep 2006Posts: 235
 
 
 | 
			  
				| So, I am leaving this here for reference...problem solved. 
 Apparently the WebSphere team found it. There is a fault on WAS 7.0 (don´t know the exactly patch level) that just ignores external MQ libraries (from 6.0.X) or even newest ones from lastest patches.
 
 They discovered this because thanks god we had a good WAS node where MQ and WAS were doing just fine. Then they checked the open process files by the WAS process (Linux 'lsof' command gives this). They compared to the other faulty servers and noticed that the referenced jars were not the same!
 
 So they tricked the faulty WAS servers moving the bad MQ jars to somewhere else and creating a symbolic link to the good ones.
 
 After that, we had no more orphaned, buggy JMS connections from WAS cummulating until MQ refusal.
 
 They still researching why the WAS namespace got messed up and are in contact with IBM to check what really happened.
 
 ***Well, not only using the wrong lib but ignoring any customizations they made on the setupcmdline.sh script, including when MQ 6 libs were expressely pointed (it´s an command line were WAS admins usually push more libs into the namespace). ***
 
 Faulty WAS servers were using this lib
 
 
 
   
	| Code: |  
	| Manifest-Version: 1.0
 Ant-Version: Apache Ant 1.7.0
 Created-By: 1.4.2 (IBM Corporation)
 Copyright-Notice: Licensed Materials - Property of IBM          5724-H
 72, 5655-L82, 5724-L26          (c) Copyright IBM Corp. 2008 All Righ
 ts Reserved.          US Government Users Restricted Rights -
 Use,duplication or disclosure restricted by GSA ADP Schedule Contra
 ct          with IBM Corp.
 Sealed: false
 Specification-Title: J2EE Connector Architecture
 Specification-Version: 1.5
 Implementation-Title: WebSphere MQ Resource Adapter
 Implementation-Version: 7.0.0.0-k700-L080820
 Implementation-Vendor: IBM Corporation
 
 |  
 The good WAS server was using this lib
 
 
 
   
	| Code: |  
	| Manifest-Version: 1.0
 Ant-Version: Apache Ant 1.7.0
 Created-By: 1.4.2 (IBM Corporation)
 Copyright-Notice: Licensed Materials - Property of IBM       5724-H72,
 5655-R36, 5724-L26, 5655-L82          (c) Copyright IBM Corp. 2008 A
 ll Rights Reserved.          US Government Users Restricted Rights -
 Use,duplication or disclosure restricted by GSA ADP Schedule
 Contract          with IBM Corp.
 Sealed: false
 Specification-Title: J2EE Connector Architecture
 Specification-Version: 1.5
 Implementation-Title: WebSphere MQ Resource Adapter
 Implementation-Version: 7.0.1.2-k701-102-100504
 Implementation-Vendor: IBM Corporation
 
 |  |  |  
		  | Back to top |  |  
		  |  |  
		  | mqjeff | 
			  
				|  Posted: Thu Sep 30, 2010 1:28 am    Post subject: |   |  |  
		  | Grand Master
 
 
 Joined: 25 Jun 2008Posts: 17447
 
 
 | 
			  
				| That suggested you had the two servers at different patch levels of WAS, which is not really a good idea. |  |  
		  | Back to top |  |  
		  |  |  
		  |  |  |  
  
	|    |  | Page 1 of 1 |  
 
 
  
  	| 
		
		  | 
 
 | You cannot post new topics in this forum You cannot reply to topics in this forum
 You cannot edit your posts in this forum
 You cannot delete your posts in this forum
 You cannot vote in polls in this forum
 
 |  |  |  |