| Author | Message | 
		
		  | gabrielj | 
			  
				|  Posted: Tue Dec 16, 2014 10:44 pm    Post subject: Websphere App Server 8.5 Node agent down intermittently |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Hi Experts, 
 First of all I just brief about my environment.
 
 WAS version : 8.5
 OS : HP UX
 1 DMGR, 2 Node agent and 2 node.
 
 Recently we faced one problem. 2 weeks once automatically node agent getting down. Log says "Too many open file". We used LSOF commands to monitor the Node agent process ID.
 
 Some kernal parameter.
 hardfile limit: 20000
 softfile limit: 8126
 
 This is the monitoring output.
 Date 	Server 1 Node Agent PID(6054) 	Server 2 Node Agent PID(8076)
 10/12/2014 	1636 	1641
 11/12/2014 	2382 	2390
 14/12/2014 	4527 	4534
 15/12/2014 	5265 	5274
 
 Node agent process only keeps on opening the file. it is not closing the file descriptor. We have already raised PMR to IBM but not yet received any working solution.
 
 please provide the solution if you have any.
 
 Thanks in advance.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Mon Feb 02, 2015 11:33 pm    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Hi All, 
 Issue is not yet resolved.  We can finally trace node agent open too many TCP connection to deployment manager port# (7060 - XDAGENT_PORT).
 Every day it opens 1200 TCP connection approximately. 10% connections are in establish state. 50% connections are in IDLE state remaining are in CLOSE_WAIT state. Please share your thoughts why this is happens?
 
 Some more info about current environment
 WAS 8.5 install in two nodes and using persistence session management
 Auto synchronization enabled. Sync interval is : 1 min.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Tue Feb 03, 2015 5:22 am    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| 
   
	| gabrielj wrote: |  
	| Hi All, 
 Issue is not yet resolved.  We can finally trace node agent open too many TCP connection to deployment manager port# (7060 - XDAGENT_PORT).
 Every day it opens 1200 TCP connection approximately. 10% connections are in establish state. 50% connections are in IDLE state remaining are in CLOSE_WAIT state. Please share your thoughts why this is happens?
 
 Some more info about current environment
 WAS 8.5 install in two nodes and using persistence session management
 Auto synchronization enabled. Sync interval is : 1 min.
 |  Could it be related to persistent session management and not releasing the sync resource but needing to reacquire it?
  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Tue Feb 03, 2015 5:48 am    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Thanks a lot fjb_saper for your response. 
 We have already raised one PMR to IBM for session table space issue. We are using persistence session management. every day WAS session manager insert 700MB data to the session DB. Session DB keeps on increasing. Session manager is not cleanup the row when session invalidate happens. We enabled the  session trace sent to IBM. They responded as "session manager cleanup the row from session DB". They wanted to investigate the behavior of session manger when session schedule clean up is off..
 
 So come to the our issue
 1. is there any relationship between node Synchronization and session DB?.
 2.  LSOF command
 java    6054 wasuser 6478u  IPv4 0xe00000036a6fad00     0t2943       TCP j2elive1:50298->jataayu1:7060 (IDLE)
 here 6054 - Node agent process id,
 jataayu1 - DMGR
 7060  - XDAGENT_PORT
 Based on this output, Node agent creates the connection to DMGR
 3. is there any OS commands to explicitly cleanup all IDLE TCP connection?
 4. Current TCP Keep alive interval set in OS level : 1800000 (30 Mins)
 
 
 Please advice, whether should I analysis any other area like OS level configuration or WAS level configuration checking?
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Tue Feb 03, 2015 5:53 am    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| Is your session timeout by any chance greater than your TCP-keep alive?  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Tue Feb 03, 2015 5:58 am    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Hi fjb_saper 
 Web container level session management values:
 
 session timeout : 30 Mins
 In memory session: 6000
 Serialized session access: unchecked [/img]
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Tue Feb 03, 2015 6:02 am    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| I believe the values between session time-out and tcp time-out are too close. But then that's only me. What happens if you set the session time-out to 5 mins less than tcp keep alive? Still as many close-waits?
  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Tue Feb 03, 2015 6:14 am    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Hi fjb_saper, This is happens in production only.. Test env working fine.. We can't migrate the our thoughts into production directly until we have a solid solution.
 
 One more question i want to clarify.
 
 How do u related Node synchronization operation with session manager?. Based on my knowledge Node Sync move the files between the nodes based on the master config repo in DMGR.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Tue Feb 03, 2015 9:12 am    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| 
   
	| gabrielj wrote: |  
	| Hi fjb_saper, This is happens in production only.. Test env working fine.. We can't migrate the our thoughts into production directly until we have a solid solution.
 
 One more question i want to clarify.
 
 How do u related Node synchronization operation with session manager?. Based on my knowledge Node Sync move the files between the nodes based on the master config repo in DMGR.
 |  
 Well somehow this problems seems to involve tcp/ip connections that are percieved as closed on one side and open on the other...
  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Tue Feb 03, 2015 9:37 am    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Hi saper, 
 Thanks for your thought.
 Friend of mine said that these OVERLAY_UDP_LISTENER_ADDRESS, OVERLAY_TCP_LISTENER_ADDRESS,XDAGENT_PORT should be opened in bidirectional way. is that correct? He said this is newly introduced in WAS 8.5 version. Anyway tomorrow I will check in the Production environment and update the status.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Sat Feb 07, 2015 10:36 pm    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Hi fjb_saper, 
 We have checked the TCP connectivity between DMGR to Node and Node to DMGR.
 
 There is no problem in DMGR to Node. But we have so many IDLE connection from Node to DMGR.
 
 I suspect that maybe there is something wrong in your HP-UX OS TCP stack causing it to not "clean up" sockets that have been closed.   Maybe there are some OS TCP related fixes that you need to apply on your system?
 
 I already raised same issue to HP support.
 
 I will keep you posted
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Mon Mar 02, 2015 10:43 pm    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Dear All, 
 Following reply we have received from OS vendor.
 
 We have seen so many idle connection when we ran LOSF command, then we asked OS vendor that why these idle connections are clean up by OS when TCP_IDLE_TIMEOUT reach?
 
 OS Vendor Reply: This would require engagement from application team(Websphere) to know why so many IDLE endpoints are left open and to answer why OS is not closing them, is because these are application endpoints and not TCP connections with IDLE status.
 what was captured was an lsof output which shows IDLE connections from application standpoint, and not the same as TCP_IDLE.
 
 Is there any way to configure Idle connection cleanup in Websphere configuration level? This is happening when synchronization happen from Node agent to DMGR. These IDLE connection are created in Node agent machine.
 
 Please advice.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | gabrielj | 
			  
				|  Posted: Tue Nov 10, 2015 9:09 pm    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 16 Nov 2014Posts: 23
 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
 
 | 
			  
				| Hi All, 
 Finally IBM given solution for this issue, They are recommended to use 8.5.5.2 version along with latest JVM(if you are using HP-UX ). After Fixpack upgradation every thing works as expected.
 
 Thanks for your suggestion and replies.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  |  |