Memory Leak or Congested Workers with modified DeepLearning4Java (using akka)

I am using a modified version of DeepLearning4Java to process documents using a UIMA CollectionReader. For large document collections I ran into a GC Overhead Limit Error or different types of TimeOut errors (ex. Exception in thread "RMI TCP Connection(idle)") as more time is spent garbage collecting. I'm not sure if it is a memory leak or I am simply piling up too much work in the workers mailbox. I'm unfamiliar with scala and akka which doesn't help.

What happens is that my application runs fine until it gets close to the Heap Limit (tried with 4GB and 8GB) where it slows down before hitting the GC Overhead Limit. It's not an issue of PermGen space usage never exceeds 45 MB and it's also not an issue of creating too many classes - I only ever see around 7000 loaded and it is basically entirely flat through runtime.

The main culprits can be seen in the screenshot below.

These objects are instantiated in org.deeplearning4j.bagofwords.vectorizer.BaseTextVectorizer with vocabActor.tell.

while(docIter != null && docIter.hasNext()) {

            vocabActor.tell(new StreamWork(new DefaultInputStreamCreator(docIter),latch),vocabActor);

            queued.incrementAndGet();
            if(queued.get() % 10000 == 0) {
                log.info("Sent " + queued);
                try {
                    Thread.sleep(1);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }

        }

The tell function I understand to be scala code in akka

  final def tell(msg: Any, sender: ActorRef): Unit = this.!(msg)(sender)

My understanding is that this goes into a worker's mailbox to await processing - but I assume all references to this would disappear once the work has been processed. So I'm not sure why so many objects are persisting, there must be some hook that is preventing the GC from trashing these objects - perhaps because they are in the mailbox and haven't been worked on yet? The loop can run for awhile, but I would assume that all the StreamWork objects are getting recycled.

My question is if there is a way to figure out if I need to be switching to a different type of dispatcher to somehow throttle message generation or whether I should be looking into memory leaks. I can post the DocumentIterator or other code if needed.

Solution

Please, always use the most recent dl4j/nd4j versions available at Maven Central. The bug you're speaking about was fixed for a while now, and Akka isn't used there anymore.

p.s. the most recent version is 0.4-rc3.8 at this moment.