MQSeries.net :: View topic - Websphere App Server 8.5 Node agent down intermittently

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Application Server » Websphere App Server 8.5 Node agent down intermittently

Websphere App Server 8.5 Node agent down intermittently

« View previous topic :: View next topic »

Author

Message

gabrielj

Posted: Tue Dec 16, 2014 10:44 pm Post subject: Websphere App Server 8.5 Node agent down intermittently

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Hi Experts,

First of all I just brief about my environment.

WAS version : 8.5
OS : HP UX
1 DMGR, 2 Node agent and 2 node.

Recently we faced one problem. 2 weeks once automatically node agent getting down. Log says "Too many open file". We used LSOF commands to monitor the Node agent process ID.

Some kernal parameter.
hardfile limit: 20000
softfile limit: 8126

This is the monitoring output.
Date Server 1 Node Agent PID(6054) Server 2 Node Agent PID(8076)
10/12/2014 1636 1641
11/12/2014 2382 2390
14/12/2014 4527 4534
15/12/2014 5265 5274

Node agent process only keeps on opening the file. it is not closing the file descriptor. We have already raised PMR to IBM but not yet received any working solution.

please provide the solution if you have any.

Thanks in advance.

gabrielj

Posted: Mon Feb 02, 2015 11:33 pm Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Hi All,

Issue is not yet resolved. We can finally trace node agent open too many TCP connection to deployment manager port# (7060 - XDAGENT_PORT).
Every day it opens 1200 TCP connection approximately. 10% connections are in establish state. 50% connections are in IDLE state remaining are in CLOSE_WAIT state. Please share your thoughts why this is happens?

Some more info about current environment
WAS 8.5 install in two nodes and using persistence session management
Auto synchronization enabled. Sync interval is : 1 min.

fjb_saper

Posted: Tue Feb 03, 2015 5:22 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

gabrielj wrote:

Could it be related to persistent session management and not releasing the sync resource but needing to reacquire it?

_________________
MQ & Broker admin

gabrielj

Posted: Tue Feb 03, 2015 5:48 am Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Thanks a lot fjb_saper for your response.

We have already raised one PMR to IBM for session table space issue. We are using persistence session management. every day WAS session manager insert 700MB data to the session DB. Session DB keeps on increasing. Session manager is not cleanup the row when session invalidate happens. We enabled the session trace sent to IBM. They responded as "session manager cleanup the row from session DB". They wanted to investigate the behavior of session manger when session schedule clean up is off..

So come to the our issue
1. is there any relationship between node Synchronization and session DB?.
2. LSOF command
java 6054 wasuser 6478u IPv4 0xe00000036a6fad00 0t2943 TCP j2elive1:50298->jataayu1:7060 (IDLE)
here 6054 - Node agent process id,
jataayu1 - DMGR
7060 - XDAGENT_PORT
Based on this output, Node agent creates the connection to DMGR
3. is there any OS commands to explicitly cleanup all IDLE TCP connection?
4. Current TCP Keep alive interval set in OS level : 1800000 (30 Mins)

Please advice, whether should I analysis any other area like OS level configuration or WAS level configuration checking?

fjb_saper

Posted: Tue Feb 03, 2015 5:53 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

Is your session timeout by any chance greater than your TCP-keep alive?

_________________
MQ & Broker admin

gabrielj

Posted: Tue Feb 03, 2015 5:58 am Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Hi fjb_saper

Web container level session management values:

session timeout : 30 Mins
In memory session: 6000
Serialized session access: unchecked [/img]

fjb_saper

Posted: Tue Feb 03, 2015 6:02 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

I believe the values between session time-out and tcp time-out are too close.
But then that's only me. What happens if you set the session time-out to 5 mins less than tcp keep alive? Still as many close-waits?

_________________
MQ & Broker admin

gabrielj

Posted: Tue Feb 03, 2015 6:14 am Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Hi fjb_saper,
This is happens in production only.. Test env working fine.. We can't migrate the our thoughts into production directly until we have a solid solution.

One more question i want to clarify.

How do u related Node synchronization operation with session manager?. Based on my knowledge Node Sync move the files between the nodes based on the master config repo in DMGR.

fjb_saper

Posted: Tue Feb 03, 2015 9:12 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

gabrielj wrote:

Well somehow this problems seems to involve tcp/ip connections that are percieved as closed on one side and open on the other...

_________________
MQ & Broker admin

gabrielj

Posted: Tue Feb 03, 2015 9:37 am Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Hi saper,

Thanks for your thought.
Friend of mine said that these OVERLAY_UDP_LISTENER_ADDRESS, OVERLAY_TCP_LISTENER_ADDRESS,XDAGENT_PORT should be opened in bidirectional way. is that correct? He said this is newly introduced in WAS 8.5 version. Anyway tomorrow I will check in the Production environment and update the status.

gabrielj

Posted: Sat Feb 07, 2015 10:36 pm Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Hi fjb_saper,

We have checked the TCP connectivity between DMGR to Node and Node to DMGR.

There is no problem in DMGR to Node. But we have so many IDLE connection from Node to DMGR.

I suspect that maybe there is something wrong in your HP-UX OS TCP stack causing it to not "clean up" sockets that have been closed. Maybe there are some OS TCP related fixes that you need to apply on your system?

I already raised same issue to HP support.

I will keep you posted

gabrielj

Posted: Mon Mar 02, 2015 10:43 pm Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Dear All,

Following reply we have received from OS vendor.

We have seen so many idle connection when we ran LOSF command, then we asked OS vendor that why these idle connections are clean up by OS when TCP_IDLE_TIMEOUT reach?

OS Vendor Reply: This would require engagement from application team(Websphere) to know why so many IDLE endpoints are left open and to answer why OS is not closing them, is because these are application endpoints and not TCP connections with IDLE status.
what was captured was an lsof output which shows IDLE connections from application standpoint, and not the same as TCP_IDLE.

Is there any way to configure Idle connection cleanup in Websphere configuration level? This is happening when synchronization happen from Node agent to DMGR. These IDLE connection are created in Node agent machine.

Please advice.

gabrielj

Posted: Tue Nov 10, 2015 9:09 pm Post subject:

Novice

Joined: 16 Nov 2014
Posts: 23
Location: Muscut,Perth,Sydney,Bangalore,Hydrabad,Coimbatore

Hi All,

Finally IBM given solution for this issue, They are recommended to use 8.5.5.2 version along with latest JVM(if you are using HP-UX ). After Fixpack upgradation every thing works as expected.

Thanks for your suggestion and replies.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Application Server » Websphere App Server 8.5 Node agent down intermittently

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP