I can successfully run the following code on a CPU cluster in Databricks.
import torch
import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
"mosaicml/mpt-7b",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
On the CPU databricks cluster, I first installed Pytorch 2.0.1 ;Transformers 4.28.1 & einops 0.6.1.
However, the same Python code fails on a GPU cluster - with the following error:
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn
I then tried to install the required package pip install flash-attn
on the Databricks GPU cluster. (based on instructions HERE
However, I have been unable to install Flash Attention on the GPU cluster.
On GPU I tried the following:
pip install flash-attn
resulted in the following error:Collecting flash-attn
Downloading flash_attn-1.0.4.tar.gz (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 28.1 MB/s eta 0:00:0000:010:01
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from flash-attn) (1.13.1)
Requirement already satisfied: einops in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from flash-attn) (0.6.1)
Requirement already satisfied: packaging in /databricks/python3/lib/python3.10/site-packages (from flash-attn) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /databricks/python3/lib/python3.10/site-packages (from packaging->flash-attn) (3.0.9)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch->flash-attn) (11.10.3.66)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch->flash-attn) (8.5.0.96)
Requirement already satisfied: typing-extensions in /databricks/python3/lib/python3.10/site-packages (from torch->flash-attn) (4.3.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch->flash-attn) (11.7.99)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages (from torch->flash-attn) (11.7.99)
Requirement already satisfied: setuptools in /databricks/python3/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch->flash-attn) (63.4.1)
Requirement already satisfied: wheel in /databricks/python3/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch->flash-attn) (0.37.1)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [317 lines of output]
torch.__version__ = 1.13.1+cu117
fatal: not a git repository (or any of the parent directories): .git
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/rotary.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_tmp_og.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_varlen.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_tmp.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/attention_kernl.py -> build/lib.linux-x86_64-cpython-310/flash_attn
copying flash_attn/flash_attn_triton_single_query.py -> build/lib.linux-x86_64-cpython-310/flash_attn
creating build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-310/flash_attn/modules
creating build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-310/flash_attn/utils
creating build/lib.linux-x86_64-cpython-310/flash_attn/layers
copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-310/flash_attn/layers
creating build/lib.linux-x86_64-cpython-310/flash_attn/triton
copying flash_attn/triton/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/triton
copying flash_attn/triton/fused_attention.py -> build/lib.linux-x86_64-cpython-310/flash_attn/triton
creating build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/cross_entropy_apex.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/cross_entropy_parallel.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/flash_attn/losses
creating build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/gelu_activation.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops
creating build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gpt_j.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/flash_attn/models
running build_ext
building 'flash_attn_cuda' extension
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn
creating /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src
Emitting ninja build file /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/cutlass/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/TH -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/local_disk0/.ephemeral_nfs/envs/pythonEnv-69bd3443-3436-4892-a827-1f1b494c1c35/include -I/usr/include/python3.10 -c -c /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/fmha_bwd_hdim32.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src -I/tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/cutlass/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/TH -I/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/local_disk0/.ephemeral_nfs/envs/pythonEnv-69bd3443-3436-4892-a827-1f1b494c1c35/include -I/usr/include/python3.10 -c -c /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
6 | #include <cusparse.h>
| ^~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-nluf5697/flash-attn_79c1dfb03cc4482ba86c435b2db7a8b6/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
6 | #include <cusparse.h>
| ^~~~~~~~~~~~
compilation terminated.
......................................... [ removed middle of error message]
cmd_obj.run()
File "/databricks/python/lib/python3.10/site-packages/setuptools/command/build.py", line 24, in run
super().run()
File "/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
self.distribution.run_command(command)
File "/databricks/python/lib/python3.10/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 992, in run_command
cmd_obj.run()
File "/databricks/python/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/databricks/python/lib/python3.10/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "/databricks/python/lib/python3.10/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
self._build_extensions_serial()
File "/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
self.build_extension(ext)
File "/databricks/python/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/databricks/python/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
objects = self.compiler.compile(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1573, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> flash-attn
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
Second, I tried the following: 2. updated torch library on GPU to version 2.0.1 (to match the successful approach on CPU) , tried again to reinstall flash-attn . This still didn't work.
My suspicion is that the code works fine on CPU cluster because it is NOT prepackaged with an earlier version of PyTorch. I install PyTorch 2.0.1 on Databricks CPU and it works fine. However, the GPU Cluster is preinstalled with an earlier version of Pytorch, and Flash Attention has not installed on GPU.
This is likely related to missing CUDA dependencies.
Please try restarting the cluster, running the following on a notebook cell and reinstalling flash_attn
- and then give it another shot:
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb -O /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcublas-dev-11-7_11.10.1.25-1_amd64.deb -O /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb -O /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcurand-dev-11-7_10.2.10.91-1_amd64.deb -O /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb && \
dpkg -i /tmp/libcusparse-dev-11-7_11.7.3.50-1_amd64.deb && \
dpkg -i /tmp/libcublas-dev-11-7_11.10.1.25-1_amd64.deb && \
dpkg -i /tmp/libcusolver-dev-11-7_11.4.0.1-1_amd64.deb && \
dpkg -i /tmp/libcurand-dev-11-7_10.2.10.91-1_amd64.deb
Update - added both the init script and general instructions as part of this repo: https://github.com/rafaelvp-db/databricks-llm-prompt-engineering