I am trying to use the Trainer from the transformers library from HuggingFace in Python:
from transformers import Seq2SeqTrainingArguments
from transformers import Seq2SeqTrainer
# ...
training_args = Seq2SeqTrainingArguments(fp16=True,
# ...
trainer = Seq2SeqTrainer( args=training_args,
# ...
I get this error message:
ValueError: FP16 Mixed precision training with AMP or APEX (
--fp16
) and FP16 half precision evaluation (--fp16_full_eval
) can only be used on CUDA or NPU devices or certain XPU devices (with IPEX).
It seems like I am missing some CUDA installation, but I can't figure out, what exactly I need. I tried (without success):
py -m pip install --upgrade setuptools pip wheel
py -m pip install nvidia-pyindex
py -m pip install nvidia-cuda-runtime-cu12
py -m pip install nvidia-nvml-dev-cu12
py -m pip install nvidia-cuda-nvcc-cu12
System info:
Thanks for any ideas!
Found it. Adding this line to the code
model.to('cuda')
gave a more meaningful error message:
Torch not compiled with CUDA enabled
Created the correct pip install command here: https://pytorch.org/get-started/locally/
(something like pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
)
and it worked.