RuntimeError: mse_cuda not implemented for Long when training a transformer.Trainer

I'm attempting to train a HuggingFace Trainer but seeing the following error:

RuntimeError: "mse_cuda" not implemented for 'Long' when training a transformer.Trainer

I've tried this in multiple cloud environments (CPU & GPU) with no luck. The dataset (tok_dds) is of the following shape and type, and I've ensured there are no NULL values.

Dataset({
    features: ['label', 'title', 'text', 'input', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 5000
})

{'label': int,
 'title': str,
 'text': str,
 'input': str,
 'input_ids': list,
 'token_type_ids': list,
 'attention_mask': list}

I have defined my loss functions as below:

def corr(x,y): return np.corrcoef(x,y)[0][1]
def corr_d(eval_pred): return {'pearson': corr(*eval_pred)}

However, when attempting to Train the model_nm = 'microsoft/deberta-v3-small' on the train/test split of my dataset. I see the following error:

dds = tok_ds.train_test_split(0.25, seed=42)
tokz = AutoTokenizer.from_pretrained(model_nm)
model = AutoModelForSequenceClassification.from_pretrained(model_nm, num_labels=1)
trainer = Trainer(model, args, train_dataset=dds['train'], eval_dataset=dds['test'],
                  tokenizer=tokz, compute_metrics=corr_d)
...
...
File /shared-libs/python3.9/py/lib/python3.9/site-packages/torch/nn/functional.py:3280, in mse_loss(input, target, size_average, reduce, reduction)
   3277     reduction = _Reduction.legacy_get_string(size_average, reduce)
   3279 expanded_input, expanded_target = torch.broadcast_tensors(input, target)
-> 3280 return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
RuntimeError: "mse_cuda" not implemented for 'Long' when training a transformer.Trainer

Here are the args passed into the Trainer if it's relevant:

args = TrainingArguments('outputs', learning_rate=lr, warmup_ratio=0.1, lr_scheduler_type='cosine', fp16=True,
    evaluation_strategy="epoch", per_device_train_batch_size=bs, per_device_eval_batch_size=bs*2,
    num_train_epochs=epochs, weight_decay=0.01, report_to='none')

Here's is what I think may be relevant environment information

!python --version
Python 3.9.13

!pip list
Package                       Version
----------------------------- ------------
...
transformers                  4.21.1
huggingface-hub               0.8.1
pandas                        1.2.5
protobuf                      3.19.4
scikit-learn                  1.1.1
tensorflow                    2.9.1
torch                         1.12.0

Can anyone point me in the right direction to solve this problem?

Solution

Changing the datatype of the labels column from int to float solved this issue for me. If your Dataset is from a pandas DataFrame, you can change the datatype of the column before passing the dataframe to a Dataset.