Search code examples
sslglassfishliferayblocked-threads

Portal running with Glassfish 2.1.1, Liferay 5.2 and SSL get too many blocked threads


I have a portal which is running over SSL on Glassfish and uses Liferay. Last time we sent a email that brings approximately 200 people at same time to access released information our Glassfish "stalled".

From the server we could see that system resources were ok. - Glassfish has up to 8 GB to use but was using 5 GB - The server has 4 CPUs and the overall usage was around 30% - Glassfish is configured up to 400 HTTP threads.

As soon we detected that our server wasn't answering users we started a profiler in order to understand what was going on.

The threads overview show too many blocked threads: HTTPS Threads blocking other HTPPS Threads
enter image description here

From the stack it's no possible to see code other than sun, grizzly, catalina classes: enter image description here

I would like to fix such issue but right now I can tell whether I should work on our code our should replace some component like disabling SSL.

Any thoughts would be very appreciated.

Thanks.


Solution

  • A thread dump might have been easier and less intrusive than a profiler - this might have shown you where the threads are blocked in the actual running system.

    You'll have to figure out where the blocking occurred: Was it in Liferay's code or in your own? What did you have on the pages, how is the theme done? Also, note that you're running a really old version of Liferay - in case you're running CE this has been out of maintenance for a few years now (Enterprise Edition still being supported, but as you don't mention this, odds are you're running Community Edition (CE))

    Further, if you cause situations like the one you describe (sending loads of people at the same time) you might want to load test your system with an artificial load in order to see how it behaves. Also, you might want the landing page to be buffered (this is not to say that 200 users are a lot, but for any such activity you probably want to know that your system can handle it)

    Until you prove the opposite, I'd assume that there is some custom component on the page (either a portlet or the theme) that causes a bottleneck and the blocking that you discovered.