nlp pytorch word-embedding huggingface-transformers bert-language-model

How to improve code to speed up word embedding with transformer models?

I need to compute words embeddings for a bunch of documents with different language models. No problem with that, the script is doing fine, except I'm working on a notebook, without GPU and each text needs around 1.5s to be processed which is by far too long (I have thousands of texts to process).

Here is how I'm doing it with pytorch and transformers lib:

import torch
from transformers import CamembertModel, CamembertTokenizer

docs = [text1, text2, ..., text20000]
tok = CamembertTokenizer.from_pretrained('camembert-base')
model = CamembertModel.from_pretrained('camembert-base', output_hidden_states=True)
# let try with a batch size of 64 documents
docids = [tok.encode(
  doc, max_length=512, return_tensors='pt', pad_to_max_length=True) for doc in docs[:64]]
ids=torch.cat(tuple(docids))
device = 'cuda' if torch.cuda.is_available() else 'cpu' # cpu in my case...
model = model.to(device)
ids = ids.to(device)
model.eval()
with torch.no_grad():
    out = model(input_ids=ids)
# 103s later...

Do someone has any idea or suggestions to improve speed?

Solution

I don't think that there is a trivial way to significantly improve the speed, without using a GPU.

Some of the ways I could think of include smart batching, which is used by Sentence-Transformers, where you basically sort inputs of similar length together, to avoid padding to the full 512 token limit. I'm not sure how much of a speedup this is going to get you, but the only way that you can improve it significantly in a short period of time.

Otherwise, if you have access to Google colab, you can also utilize their GPU environment, if the processing can be completed in reasonable time.