I'm attempting to train a HuggingFace Trainer
but seeing the following error:
RuntimeError: "mse_cuda" not implemented for 'Long' when training a transformer.Trainer
I've tried this in multiple cloud environments (CPU & GPU) with no luck. The dataset (tok_dds
) is of the following shape and type, and I've ensured there are no NULL values.
Dataset({
features: ['label', 'title', 'text', 'input', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 5000
})
{'label': int,
'title': str,
'text': str,
'input': str,
'input_ids': list,
'token_type_ids': list,
'attention_mask': list}
I have defined my loss functions as below:
def corr(x,y): return np.corrcoef(x,y)[0][1]
def corr_d(eval_pred): return {'pearson': corr(*eval_pred)}
However, when attempting to Train the model_nm = 'microsoft/deberta-v3-small'
on the train/test split of my dataset. I see the following error:
dds = tok_ds.train_test_split(0.25, seed=42)
tokz = AutoTokenizer.from_pretrained(model_nm)
model = AutoModelForSequenceClassification.from_pretrained(model_nm, num_labels=1)
trainer = Trainer(model, args, train_dataset=dds['train'], eval_dataset=dds['test'],
tokenizer=tokz, compute_metrics=corr_d)
...
...
File /shared-libs/python3.9/py/lib/python3.9/site-packages/torch/nn/functional.py:3280, in mse_loss(input, target, size_average, reduce, reduction)
3277 reduction = _Reduction.legacy_get_string(size_average, reduce)
3279 expanded_input, expanded_target = torch.broadcast_tensors(input, target)
-> 3280 return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
RuntimeError: "mse_cuda" not implemented for 'Long' when training a transformer.Trainer
Here are the args passed into the Trainer
if it's relevant:
args = TrainingArguments('outputs', learning_rate=lr, warmup_ratio=0.1, lr_scheduler_type='cosine', fp16=True,
evaluation_strategy="epoch", per_device_train_batch_size=bs, per_device_eval_batch_size=bs*2,
num_train_epochs=epochs, weight_decay=0.01, report_to='none')
Here's is what I think may be relevant environment information
!python --version
Python 3.9.13
!pip list
Package Version
----------------------------- ------------
...
transformers 4.21.1
huggingface-hub 0.8.1
pandas 1.2.5
protobuf 3.19.4
scikit-learn 1.1.1
tensorflow 2.9.1
torch 1.12.0
Can anyone point me in the right direction to solve this problem?
Changing the datatype of the labels column from int
to float
solved this issue for me. If your Dataset is from a pandas DataFrame, you can change the datatype of the column before passing the dataframe to a Dataset.