I have a memory dump which I has made from a dying application. It has consumed all available heap (-Xmx1024m). It uses com.gargoylesoftware.htmlunit.WebClient
to crawl web pages. Makes a few http requests per minute, dies in several days. As I see from the dump, it has ~1750 instances of HtmlPage
class, each is with tones of related objects, including full content of a crawled page.
I cannot understand why the HtmlPage
are not garbage collected. I have investigated instance references and I don't see any my code holding a reference to it, and VisualVM says that "No GC root found". As I understand it should mean the object is eligible for gc, but it doesn't work.
The application is running as a simple standalone process, it doesn't use any web containers or application servers.
Any hints? What else should I look into?
Specs:
- htmlunit v2.7
- java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) Server VM (build 11.3-b02, mixed mode)
- Linux my.lan 2.6.18-128.el5 #1 SMP Wed Dec 17 11:42:39 EST 2008 i686 i686 i386 GNU/Linux
Update1
I have tried to analyse the dump by the YourKit Java Profiler. It shows me a lot of java.lang.ref.Finalizer
objects with 310mb retained size. They are created for the net.sourceforge.htmlunit.corejs.javascript.NativeGenerator#finalize()
finalizer, and the NativeGenerator
refers to Window
, then to HtmlPage
and to everything.
Does anybody know why are they stay in memory?
Note: Curious, but VisualVM showed "pending finalization" as zero.
Make sure you're calling webClient.closeAllWindows() after you're done with page(s) - otherwise JavaScript thread is continuing to run holding references to the page resources etc.