Search code examples
pythonmachine-learningnlphuggingface-transformers

NLP, how to fix that pretrained model paraphrase-multilingual-mpnet-base-v2 isn't accurate on some examples?


I use sentence-transformer model paraphrase-multilingual-mpnet-base-v2 from huggingface: https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2

My target is to find similarity between phrases. I launch the model exactly as noted at the model page:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-mpnet-base-v2')
embeddings = model.encode(sentences)

There is a problem that model counts some phrases very similar although they really aren't. In most cases the model works fine, but sometimes it is reaaly wrong.
I see that the model is pretrained and I don't know how to retrain it adding some new examples slightly change the weights.
Is it possible to add a few new examples to model and retrain it if the model is pretrained? Will it need full model train process probably with GPU?
Or how model errors could be fixed in other way?
Or the only way is to find another model that would work better?


Solution

  • What you're looking for is called fine-tuning and hugging face does provide a Trainer api: https://huggingface.co/docs/transformers/training

    The general steps are from the above tutorial (I'll be using the bert-base-cased pre-trained model and the yelp_review_full dataset to fine-tune, both can be found on huggingface, as well as PyTorch Trainer for the actual fine-tuning).

    # logging
    import logging
    # used to load the dataset
    from datasets import load_dataset
    # tokenizer, to be used on dataset
    from transformers import AutoTokenizer
    # gets the model
    from transformers import AutoModelForSequenceClassification
    # contains hyperparameters and the Trainer
    from transformers import TrainingArguments, Trainer
    # used in evaluation
    import numpy as np
    import evaluate
    
    # init the logger
    logger_file_handler = RotatingFileHandler(u'./retrain.log')
    logger_file_handler.setLevel(logging.DEBUG)
    logging.captureWarnings(True)
    root_logger = logging.getLogger()
    #root_logger.addHandler(logger_file_handler)
    root_logger.setLevel(logging.DEBUG)
    
    # load the dataset to use for fine-tuning
    dataset = load_dataset("yelp_review_full")
    # log an example of what we have
    logging.debug(dataset["train"][100])
    
    # create tokenizer
    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
    

    the map function requires a function which it uses on every item, in this case we want to tokenize only the "text" key in each item. the logging statement above will show 2 keys: "label" and "text". to keep things simple we don't want the label, but we could use it and concat the text onto the end of it.

    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)
    
    tokenized_datasets = dataset.map(tokenize_function, batched=True)
    
    # we split the data to reduce retrain time, this isn't necessary however
    small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
    small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
    
    # load up the bert-base-cased model
    model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
    

    As per the tutorial, the above line to load the model will display a warning:

    You will see a warning about some of the pretrained weights not being used and some weights being randomly initialized. Don’t worry, this is completely normal! The pretrained head of the BERT model is discarded, and replaced with a randomly initialized classification head. You will fine-tune this new model head on your sequence classification task, transferring the knowledge of the pretrained model to it.

    # we want to evaluate our performance based on accuracy, but others could be used like precision or perplexity.  Check the tutorial for links to more information on both.
    metric = evaluate.load("accuracy")
    
    # define function to compute the metrics (transformers return logits so input will be the predictions converted to logits.
    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)
    

    Here we create the class that contains all hyperparameters. output_dir says where to save checkpoints from training and evaluation_strategy sets when to report evaluation metrics, in this case we want to do this at the end of every epoch

    training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")
    
    # we create the trainer object using the model, training args,training dataset, test dataset and the function used to compute metrics
    trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
    

    )

    # finally, we call train to start the fine-tuning
    trainer.train()
    

    While this covers the basics of fine-tuning there are many ways to optimize and customize every step in the process. For a tutorial with an indepth look into things like the tokenizer and preprocessing, adding layers ontop of tasks, batch/distributed training, hyperparameter customization, experimentation and much more you can find it over at https://madewithml.com/courses/mlops/training/