Search code examples
pythonmachine-learningbert-language-modelsentence-similarity

Bert sentence-transformers stops/quits during fine tuning


I am following BERT instructions to fine tune as described here

Here is my code:

from sentence_transformers import SentenceTransformer, SentencesDataset, InputExample, losses, evaluation
from torch.utils.data import DataLoader

# load model
embedder = SentenceTransformer('bert-large-nli-mean-tokens')
print("embedder loaded...")

# define your train dataset, the dataloader, and the train loss
train_dataset = SentencesDataset(x_sample["input"].tolist(), embedder)
train_dataloader = DataLoader(train_dataset, shuffle=False, batch_size=16)
train_loss = losses.CosineSimilarityLoss(embedder)

sentences1 = ['This list contains the first column', 'With your sentences', 'You want your model to evaluate on']
sentences2 = ['Sentences contains the other column', 'The evaluator matches sentences1[i] with sentences2[i]', 'Compute the cosine similarity and compares it to scores[i]']
scores = [0.3, 0.6, 0.2]
evaluator = evaluation.EmbeddingSimilarityEvaluator(sentences1, sentences2, scores)

# tune the model
embedder.fit(train_objectives=[(train_dataloader, train_loss)], 
    epochs=1, 
    warmup_steps=100, 
    evaluator=evaluator, 
    evaluation_steps=1)

At 4% the training stops and the programs exists with no warnings or errors. There is no output.

I have no idea how to troubleshoot - any help would be great.

Edit: Changed the title from fails to stops/quits because I don't know if its failing

Here is what I see on my terminal: Epoch: 0%| Killedtion: 0%|

The word "Killed" overlaps the word iteration... memory problem perhaps? FYI: I am running it from the terminal of vscode with wsl on ubuntu vm in windows

Found the issue on github: https://github.com/ElderResearch/gpu_docker/issues/38


Solution

  • My solution was to set batch and worker to one and its very slow

    train_dataloader = DataLoader(train_dataset, shuffle=False, batch_size=1, num_workers=1)