I have a 11 million sentences corpus that I need to vectorize to do further comparisons. Everything works just fine, with the exception that it is incredibly slow on a CPU (~6 sentences per second). The call to LASER library is very simple and it doesn't have more parameters to tune-up.
from laserembeddings import Laser
laser = Laser()
vector = laser.embed_sentences("this is a test", lang="en")
On the LASER homepage they claim:
It delivers extremely fast performance, processing up to 2,000 sentences per second on GPU.
How can I make use of my GPU for this task?
SOLUTION:
I installed PyTorch with CUDA support and LASER directly starts using the GPU:
You are using this library I assume? What GPU do you have? Is it cuda
supported?
From this source, it looks like GPU support is enabled by default.
Can you check if pytorch
can reach your GPU?
import torch
print(torch.cuda.is_available())