MQSeries.net :: View topic

RogerLacroix · Posted: Thu Aug 02, 2012 8:53 pm Post subject:

All,

I'm trying to help out a customer with a really strange problem where the queue manager appears to hang but really is just refusing to accept new connections (and no, it has nothing to reaching the max number of connections).

- The server-side queue manager is on Solaris x86 64-bit running WMQ v7.0.1.8.

- Client-side is a JMS application using ETC, and very old (and outdated) MQ JAR files connecting remotely to the queue manager.

The application has not changed in a long time. The only 2 things that has changed in the last few months is: (1) is applied Fix Pack 7.0.1.8 and (2) the volume of connections may have increased.

Before I describe the application, yes, it is poorly written and it is what it is and has been running for the customer for at least 5 years, so lets just skip over the "poorly designed" or "should be redesigned" comments.

The application uses JMS and ETC to create a transaction and does the following: MQCONN, MQOPEN, MQPUT, MQCLOSE, MQDISC (it uses JMS verbs, I just used the MQ API). It does roughly 4.2 connections per second. Or roughly 252 connections per minute. It has burst where the application exceeds 30 connections per second (yes, in a perfect world, this application should be written).

The customer has been using MQAUSX for authentication on this server without change for at least 18 months. MQAUSX forks a process to do the authentication on Solaris. It appears that some sort of resource limit is being hit because in 1 MQ trace, the MQAUSX fork appears to fail but MQAUSX logfile does not show any failure (yes, yes he said she said).

On the same setup in their "stage" environment, the problem does not happen but their stage environment does less than half the number of numbers per second as compared to production environment. So again, I'm thinking it is a resource issue.

I created the same environment as the customer and I have spent a week running test harnesses without any failure. I can drive 120 connections per second (7200 per minute), sustained over 24 hours to my Solaris x86 queue manager (v7.0.1.8 ) and it processes everything perfectly. I have been doing my testing with regular connections (non-ETC) as I only found out about ETC usage by the application last night.

The "hanging" seems to happen after 24 hours. Not like clock-work but more after "x" number of connections. Now I would put the number of connections after 24 hours to be at least 300,000 connections. Note: The 300,000 value is only a guess on my part based on the 4.2 connections per second over a 24 hour period.

When they remove the security exit from the channel, then the "hanging" does not happen (oh what fun!!).

So, the question is: what "extra" resources on Solaris does the queue manager's MCA use when dealing with a connection using a transaction versus a regular non-ETC connection.

I am not a Solaris SysAdmin. Here is what the customer is using for kernel values: