Author |
Message
|
gabrielj |
Posted: Tue Dec 16, 2014 10:44 pm Post subject: Websphere App Server 8.5 Node agent down intermittently |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Hi Experts,
First of all I just brief about my environment.
WAS version : 8.5
OS : HP UX
1 DMGR, 2 Node agent and 2 node.
Recently we faced one problem. 2 weeks once automatically node agent getting down. Log says "Too many open file". We used LSOF commands to monitor the Node agent process ID.
Some kernal parameter.
hardfile limit: 20000
softfile limit: 8126
This is the monitoring output.
Date Server 1 Node Agent PID(6054) Server 2 Node Agent PID(8076)
10/12/2014 1636 1641
11/12/2014 2382 2390
14/12/2014 4527 4534
15/12/2014 5265 5274
Node agent process only keeps on opening the file. it is not closing the file descriptor. We have already raised PMR to IBM but not yet received any working solution.
please provide the solution if you have any.
Thanks in advance. |
|
Back to top |
|
 |
gabrielj |
Posted: Mon Feb 02, 2015 11:33 pm Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Hi All,
Issue is not yet resolved. We can finally trace node agent open too many TCP connection to deployment manager port# (7060 - XDAGENT_PORT).
Every day it opens 1200 TCP connection approximately. 10% connections are in establish state. 50% connections are in IDLE state remaining are in CLOSE_WAIT state. Please share your thoughts why this is happens?
Some more info about current environment
WAS 8.5 install in two nodes and using persistence session management
Auto synchronization enabled. Sync interval is : 1 min. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Feb 03, 2015 5:22 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
gabrielj wrote: |
Hi All,
Issue is not yet resolved. We can finally trace node agent open too many TCP connection to deployment manager port# (7060 - XDAGENT_PORT).
Every day it opens 1200 TCP connection approximately. 10% connections are in establish state. 50% connections are in IDLE state remaining are in CLOSE_WAIT state. Please share your thoughts why this is happens?
Some more info about current environment
WAS 8.5 install in two nodes and using persistence session management
Auto synchronization enabled. Sync interval is : 1 min. |
Could it be related to persistent session management and not releasing the sync resource but needing to reacquire it?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gabrielj |
Posted: Tue Feb 03, 2015 5:48 am Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Thanks a lot fjb_saper for your response.
We have already raised one PMR to IBM for session table space issue. We are using persistence session management. every day WAS session manager insert 700MB data to the session DB. Session DB keeps on increasing. Session manager is not cleanup the row when session invalidate happens. We enabled the session trace sent to IBM. They responded as "session manager cleanup the row from session DB". They wanted to investigate the behavior of session manger when session schedule clean up is off..
So come to the our issue
1. is there any relationship between node Synchronization and session DB?.
2. LSOF command
java 6054 wasuser 6478u IPv4 0xe00000036a6fad00 0t2943 TCP j2elive1:50298->jataayu1:7060 (IDLE)
here 6054 - Node agent process id,
jataayu1 - DMGR
7060 - XDAGENT_PORT
Based on this output, Node agent creates the connection to DMGR
3. is there any OS commands to explicitly cleanup all IDLE TCP connection?
4. Current TCP Keep alive interval set in OS level : 1800000 (30 Mins)
Please advice, whether should I analysis any other area like OS level configuration or WAS level configuration checking? |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Feb 03, 2015 5:53 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Is your session timeout by any chance greater than your TCP-keep alive?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gabrielj |
Posted: Tue Feb 03, 2015 5:58 am Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Hi fjb_saper
Web container level session management values:
session timeout : 30 Mins
In memory session: 6000
Serialized session access: unchecked [/img] |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Feb 03, 2015 6:02 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
I believe the values between session time-out and tcp time-out are too close.
But then that's only me. What happens if you set the session time-out to 5 mins less than tcp keep alive? Still as many close-waits?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gabrielj |
Posted: Tue Feb 03, 2015 6:14 am Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Hi fjb_saper,
This is happens in production only.. Test env working fine.. We can't migrate the our thoughts into production directly until we have a solid solution.
One more question i want to clarify.
How do u related Node synchronization operation with session manager?. Based on my knowledge Node Sync move the files between the nodes based on the master config repo in DMGR. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Feb 03, 2015 9:12 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
gabrielj wrote: |
Hi fjb_saper,
This is happens in production only.. Test env working fine.. We can't migrate the our thoughts into production directly until we have a solid solution.
One more question i want to clarify.
How do u related Node synchronization operation with session manager?. Based on my knowledge Node Sync move the files between the nodes based on the master config repo in DMGR. |
Well somehow this problems seems to involve tcp/ip connections that are percieved as closed on one side and open on the other...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gabrielj |
Posted: Tue Feb 03, 2015 9:37 am Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Hi saper,
Thanks for your thought.
Friend of mine said that these OVERLAY_UDP_LISTENER_ADDRESS, OVERLAY_TCP_LISTENER_ADDRESS,XDAGENT_PORT should be opened in bidirectional way. is that correct? He said this is newly introduced in WAS 8.5 version. Anyway tomorrow I will check in the Production environment and update the status. |
|
Back to top |
|
 |
gabrielj |
Posted: Sat Feb 07, 2015 10:36 pm Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Hi fjb_saper,
We have checked the TCP connectivity between DMGR to Node and Node to DMGR.
There is no problem in DMGR to Node. But we have so many IDLE connection from Node to DMGR.
I suspect that maybe there is something wrong in your HP-UX OS TCP stack causing it to not "clean up" sockets that have been closed. Maybe there are some OS TCP related fixes that you need to apply on your system?
I already raised same issue to HP support.
I will keep you posted |
|
Back to top |
|
 |
gabrielj |
Posted: Mon Mar 02, 2015 10:43 pm Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Dear All,
Following reply we have received from OS vendor.
We have seen so many idle connection when we ran LOSF command, then we asked OS vendor that why these idle connections are clean up by OS when TCP_IDLE_TIMEOUT reach?
OS Vendor Reply: This would require engagement from application team(Websphere) to know why so many IDLE endpoints are left open and to answer why OS is not closing them, is because these are application endpoints and not TCP connections with IDLE status.
what was captured was an lsof output which shows IDLE connections from application standpoint, and not the same as TCP_IDLE.
Is there any way to configure Idle connection cleanup in Websphere configuration level? This is happening when synchronization happen from Node agent to DMGR. These IDLE connection are created in Node agent machine.
Please advice. |
|
Back to top |
|
 |
gabrielj |
Posted: Tue Nov 10, 2015 9:09 pm Post subject: |
|
|
Novice
Joined: 16 Nov 2014 Posts: 23 Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore
|
Hi All,
Finally IBM given solution for this issue, They are recommended to use 8.5.5.2 version along with latest JVM(if you are using HP-UX ). After Fixpack upgradation every thing works as expected.
Thanks for your suggestion and replies. |
|
Back to top |
|
 |
|