I am creating a function in R that embeds sentences using the sentence_transformers
library from Python.
For some unknown reason, creating the object multiple times under the same variable name ends up in insufficient memory space to allocate the transformer. To reproduce:
sentence_transformers <- reticulate::import("sentence_transformers")
for (i in 1:10) {
print(i)
bert_encoder <- sentence_transformers$SentenceTransformer("bert-large-nli-stsb-mean-tokens")
}
However, doing the same operation directly on Python does not produce an error
from sentence_transformers import SentenceTransformer
for i in range(10):
print(i)
bert_encoder = SentenceTransformer("bert-large-nli-stsb-mean-tokens")
}
This happens with any model that is allocated in GPU. On my NVIDIA GTX 1060 it reaches the 4th cycle, but on smaller GPUs it crashes earlier. One temporal solution is to create the model outside only once, and then pass the model as a parameter to the function as many times as wanted, but I would rather avoid that because it adds an extra step and in any case calling multiple models might just make it crash as well.
The for loop finishes without an error
Error in py_call_impl(callable, dots$args, dots$keywords) : RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 2.95 GiB already allocated; 16.11 MiB free; 238.68 MiB cached)
reticulate::py_run_string()
and then doing del bert_encoder
and calling the garbage collectorWindows 10 Home
Python 3.7.4
R 4.0.1
Reticulate 1.16
Torch 1.3.1
Tensorflow 2.2.0
Transformers 2.11.0
sentence_transformers 0.2.6
Ok so I am posting my solution for anyone else having this issue.
After each call to the model as
sentence_transformers <- import("sentence_transformers")
encoder <- sentence_transformers$SentenceTransformer("bert-large-nli-stsb-mean-tokens")
I release GPU memory using
# Has this been done on a GPU?
py <- reticulate::py_run_string("import torch
is_cuda_available = torch.cuda.is_available()")
# Release GPU
if (isTRUE(reticulate::py$is_cuda_available)) {
tryCatch(reticulate::py_run_string("del encoder"),
warning = function(e) {},
error = function(e) {})
tryCatch(rm(encoder),
warning = function(e) {},
error = function(e) {})
gc(full = TRUE, verbose = FALSE)
py <- reticulate::py_run_string("import torch
torch.cuda.empty_cache()")
}
and it works perfectly.