r python-3.x pytorch transformer-model reticulate

GPU memory leakage when creating objects from sentence-transformers

Description

I am creating a function in R that embeds sentences using the sentence_transformers library from Python.

For some unknown reason, creating the object multiple times under the same variable name ends up in insufficient memory space to allocate the transformer. To reproduce:

sentence_transformers <- reticulate::import("sentence_transformers")
for (i in 1:10) {
  print(i)
  bert_encoder <- sentence_transformers$SentenceTransformer("bert-large-nli-stsb-mean-tokens")
}

However, doing the same operation directly on Python does not produce an error

from sentence_transformers import SentenceTransformer
for i in range(10):
    print(i)
    bert_encoder = SentenceTransformer("bert-large-nli-stsb-mean-tokens")
}

This happens with any model that is allocated in GPU. On my NVIDIA GTX 1060 it reaches the 4th cycle, but on smaller GPUs it crashes earlier. One temporal solution is to create the model outside only once, and then pass the model as a parameter to the function as many times as wanted, but I would rather avoid that because it adds an extra step and in any case calling multiple models might just make it crash as well.

Expected behaviour

The for loop finishes without an error

Observed behaviour

Error in py_call_impl(callable, dots$args, dots$keywords) : RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 2.95 GiB already allocated; 16.11 MiB free; 238.68 MiB cached)

Unsuccesful attemps at solving it

The solutions proposed here
Using numba as suggested here
Declaring the variable explicitely on Python via reticulate::py_run_string() and then doing del bert_encoder and calling the garbage collector

Details

Windows 10 Home

Python 3.7.4

R 4.0.1

Reticulate 1.16

Torch 1.3.1

Tensorflow 2.2.0

Transformers 2.11.0

sentence_transformers 0.2.6

Solution

Ok so I am posting my solution for anyone else having this issue.

After each call to the model as

sentence_transformers <- import("sentence_transformers")
encoder <- sentence_transformers$SentenceTransformer("bert-large-nli-stsb-mean-tokens")

I release GPU memory using

  # Has this been done on a GPU?
  py <- reticulate::py_run_string("import torch
is_cuda_available = torch.cuda.is_available()")

  # Release GPU
  if (isTRUE(reticulate::py$is_cuda_available)) {

    tryCatch(reticulate::py_run_string("del encoder"),
             warning = function(e) {},
             error = function(e) {})

    tryCatch(rm(encoder),
             warning = function(e) {},
             error = function(e) {})

    gc(full = TRUE, verbose = FALSE)

    py <- reticulate::py_run_string("import torch
torch.cuda.empty_cache()")

  }

and it works perfectly.