Search code examples
machine-learningpytorchhuggingface-transformershuggingface-datasets

How to test my trained huggingface model on the test dataset?


I was following the huggingface tutorial on training a multiple choice QA model and trained my model with

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_qa["train"],
    eval_dataset=tokenized_qa["validation"],
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
    compute_metrics=compute_metrics
)

trainer.train()

Afterwards I can load the model with:

# load trained model for testing
model = AutoModelForMultipleChoice.from_pretrained('results/checkpoint-1000')

But how can I test it on the testing dataset?

The dataset looks like:

DatasetDict({
    train: Dataset({
        features: ['id', 'sent1', 'sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label', 'input_ids', 'attention_mask'],
        num_rows: 10178
    })
    test: Dataset({
        features: ['id', 'sent1', 'sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label', 'input_ids', 'attention_mask'],
        num_rows: 1273
    })
    validation: Dataset({
        features: ['id', 'sent1', 'sent2', 'ending0', 'ending1', 'ending2', 'ending3', 'label', 'input_ids', 'attention_mask'],
        num_rows: 1272
    })
})

I have quite a bit of code so if there's more information needed please do let me know.


Solution

  • Okay figured it out and adding an answer for completion. Seems like the training arguments from the trainer class are not needed:

    trainer = Trainer(
        model=model,
        tokenizer=tokenizer,
        data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
        compute_metrics=compute_metrics
    )
    

    Put in evaluation mode:

    model.eval() # put in testing mode (dropout modules are deactivated)
    

    And then call:

    trainer.predict(tokenized_qa["test"])
    
    PredictionOutput(predictions=array([[-1.284791 , -1.2848296, -1.2848794, -1.2848705],
           [-1.2848867, -1.2849237, -1.2848233, -1.2848446],
           [-1.284851 , -1.2847253, -1.2849066, -1.2848204],
           ...,
           [-1.284877 , -1.2848783, -1.284853 , -1.284804 ],
           [-1.2848401, -1.2848557, -1.2847972, -1.2848665],
           [-1.2848748, -1.2848799, -1.2848252, -1.2848618]], dtype=float32), label_ids=array([1, 3, 1, ..., 1, 2, 2]), metrics={'test_loss': 1.386292576789856, 'test_accuracy': 0.25727773406766324, 'test_runtime': 16.0096, 'test_samples_per_second': 79.39, 'test_steps_per_second': 9.932})