How to fully release GPU memory used in function

I'm using cupy in a function that receives a numpy array, shoves it on the GPU, does some operations on it and returns a cp.asnumpy copy of it.

The problem: The memory is not freed after the function (as seen in ndidia-smi).

I know about the caching and re-using of memory done by cupy. However, this seems to work only per-user. When multiple users are computing on the same GPU-server, they are limited by the cached memory of other users.

I also tried calling cp._default_memory_pool.free_all_blocks() inside the function at the end. This seems to have no effect. Importing cupy in the main code and calling free_all_blocks "manually" works, but I'd like to encapsulate the GPU stuff in the function, not visible to the user.

Can you fully release GPU memory used inside a function so that it's usable by other users?

Minimal example:

Main module:

# dont import cupy here, only numpy
import numpy as np

# module in which cupy is imported and used
from memory_test_module import test_function

# host array
arr = np.arange(1000000)

# out is also on host, gpu stuff happens in test_function
out = test_function(arr)

# GPU memory is not released here, unless manually:
import cupy as cp
cp._default_memory_pool.free_all_blocks()

Function module:

import cupy as cp

def test_function(arr):
    arr_gpu = cp.array(arr)
    arr_gpu += 1
    out_host = cp.asnumpy(arr_gpu)

    # this has no effect
    cp._default_memory_pool.free_all_blocks()

    return out_host

Solution

CuPy uses Python's reference counter to track which arrays are in use. In this case, you should del arr_gpu before calling free_all_blocks in test_function.

See here for more details: https://docs.cupy.dev/en/latest/user_guide/memory.html