I want to run a local FineTuning of LLama. I followed the colab notebook from the pytorch blog post "Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem".
I got everything up and running but in the training i get the a runtime error:
RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
In my understanding is a specific version needed by pytorch. The sm_80 or sm_90. The RTX 2080 ti has a base version of sm_75 but also has the sm_80 and sm_90 flag.
I checked the configuration stats of my RTX 2080 TI GPU with print(torch.__config__.show().replace("\n", "\n\t"))
and it got this:
PyTorch built with:
- GCC 9.3
- C++ Version: 201703
- Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 12.1
- NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
- CuDNN 8.9.2
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
What can I do to enable the training or is the card not capable of it?
The full error report ist here:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[19], line 2
1 ## start training
----> 2 trainer.train()
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:323, in SFTTrainer.train(self, *args, **kwargs)
320 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
321 self.model = self._trl_activate_neftune(self.model)
--> 323 output = super().train(*args, **kwargs)
325 # After training we make sure to retrieve back the original forward pass method
326 # for the embedding layer by removing the forward post hook.
327 if self.neftune_noise_alpha is not None and not self._trainer_supports_neftune:
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/transformers/trainer.py:1539, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1537 hf_hub_utils.enable_progress_bars()
1538 else:
-> 1539 return inner_training_loop(
1540 args=args,
1541 resume_from_checkpoint=resume_from_checkpoint,
1542 trial=trial,
1543 ignore_keys_for_eval=ignore_keys_for_eval,
1544 )
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/transformers/trainer.py:1869, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1866 self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
1868 with self.accelerator.accumulate(model):
-> 1869 tr_loss_step = self.training_step(model, inputs)
1871 if (
1872 args.logging_nan_inf_filter
1873 and not is_torch_tpu_available()
1874 and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
1875 ):
1876 # if loss is nan or inf simply add the average of previous logged losses
1877 tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/transformers/trainer.py:2777, in Trainer.training_step(self, model, inputs)
2775 scaled_loss.backward()
2776 else:
-> 2777 self.accelerator.backward(loss)
2779 return loss.detach() / self.args.gradient_accumulation_steps
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/accelerate/accelerator.py:1964, in Accelerator.backward(self, loss, **kwargs)
1962 self.scaler.scale(loss).backward(**kwargs)
1963 else:
-> 1964 loss.backward(**kwargs)
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/_tensor.py:492, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
482 if has_torch_function_unary(self):
483 return handle_torch_function(
484 Tensor.backward,
485 (self,),
(...)
490 inputs=inputs,
491 )
--> 492 torch.autograd.backward(
493 self, gradient, retain_graph, create_graph, inputs=inputs
494 )
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/autograd/__init__.py:251, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
246 retain_graph = create_graph
248 # The reason we repeat the same comment below is that
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
254 retain_graph,
255 create_graph,
256 inputs,
257 allow_unreachable=True,
258 accumulate_grad=True,
259 )
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/autograd/function.py:288, in BackwardCFunction.apply(self, *args)
282 raise RuntimeError(
283 "Implementing both 'backward' and 'vjp' for a custom "
284 "Function is not allowed. You should only implement one "
285 "of them."
286 )
287 user_fn = vjp_fn if vjp_fn is not Function.vjp else backward_fn
--> 288 return user_fn(self, *args)
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/utils/checkpoint.py:288, in CheckpointFunction.backward(ctx, *args)
283 if len(outputs_with_grad) == 0:
284 raise RuntimeError(
285 "none of output has requires_grad=True,"
286 " this checkpoint() is not necessary"
287 )
--> 288 torch.autograd.backward(outputs_with_grad, args_with_grad)
289 grads = tuple(
290 inp.grad if isinstance(inp, torch.Tensor) else None
291 for inp in detached_inputs
292 )
294 return (None, None) + grads
File ~/thesis/thesis-localllm-codetuning/training22_04/lib/python3.10/site-packages/torch/autograd/__init__.py:251, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
246 retain_graph = create_graph
248 # The reason we repeat the same comment below is that
249 # some Python versions print out the first line of a multi-line function
250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
252 tensors,
253 grad_tensors_,
254 retain_graph,
255 create_graph,
256 inputs,
257 allow_unreachable=True,
258 accumulate_grad=True,
259 )
RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
I tried upgrading pytorch to the latest version, restarting the PC, restarting the jupyter notebook kernel.
I also installed the latest CUDA version and nvidia toolkits.
Installing a specific version of pytorch fixed my issue.
I used this:
pip install --force-reinstall --pre torch --index-url https://download.pytorch.org/whl/nightly/cu117
From the GitHub issue comment: installing the nightly
Thanks to @palonix for the hint to the GitHub issue!