java garbage-collection java-native-interface jcuda

JNI libraries deallocate memory upon garbage collection?

I am using JCUDA and would like to know if the JNI objects are smart enough to deallocate when they are garbage collected? I can understand why this may not work in all situations, but I know it will work in my situation, so my followup question is: how can I accomplish this? Is there a "mode" I can set? Will I need to build a layer of abstraction? Or maybe the answer really is "no don't ever try that" so then why not?

EDIT: I'm referring only to native objects created via JNI, not Java objects. I am aware that all Java objects are treated equally W.R.T. garbage collection.

Solution

Usually, such libraries do not deallocate memory due to garbage collection. Particularly: JCuda does not do this, and has no option or "mode" where this can be done.

The reason is quite simple: It does not work.

You'll often have a pattern like this:

void doSomethingWithJCuda()
{
    CUdeviceptr data = new CUdeviceptr();
    cuMemAlloc(data, 1000);

    workWith(data);

    // *(See notes below)
}

Here, native memory is allocated, and the Java object serves as a "handle" to this native memory.

At the last line, the data object goes out of scope. Thus, it becomes eligible for garbage collection. However, there are two issues:

1. The garbage collector will only destroy the Java object, and not free the memory that was allocated with cuMemAlloc or any other native call.

So you'll usually have to free the native memory, by explicitly calling

cuMemFree(data);

before leaving the method.

2. You don't know when the Java object will be garbage collected - or whether it will be garbage collected at all.

A common misconception is that an object becomes garbage collected when it is no longer reachable, but this is not necessarily true.

As bmargulies pointed out in his answer:

One means is to have a Java object with a finalizer that makes the necessary JNI call to free native memory.

It may look like a viable option to simply override the finalize() method of these "handle" objects, and do the cuMemFree(this) call there. This has been tried, for example, by the authors of JavaCL (a library that also allows using the GPU with Java, and thus, is conceptually somewhat similar to JCuda).

But it simply does not work: Even if a Java object is no longer reachable, this does not mean that it will be garbage collected immediately.

You simply don't know when the finalize() method will be called.

This can easily cause nasty errors: When you have 100 MB of GPU memory, you can use 10 CUdeviceptr objects, each allocating 10MB. Your GPU memory is full. But for Java, these few CUdeviceptr objects only occupy a few bytes, and the finalize() method may not be called at all during the runtime of the application, because the JVM simply does not need to reclaim these few bytes of memory. (Omitting discussions about hacky workarounds here, like calling System.gc() or so - the bottom line is: It does not work).

So answering your actual question: JCuda is a very low-level library. This means that you have the full power, but also the full responsibilities of manual memory management. I know that this is "inconvenient". When I started creating JCuda, I originally intended it as a low-level backend for an object-oriented wrapper library. But creating a robust, stable and universally applicable abstraction layer for a complex general-purpose library like CUDA is challenging, and I did not dare to tackle such a project - last but not least because of the complexities that are implied by ... things like garbage collection...