Accelerated PyTorch for Macbook with AMD GPUS

I followed instructions from apple website (https://developer.apple.com/metal/pytorch/) and when I verified mps support with its Python script, it just gave me back something I do not understand. (It's too long, partial listed below) I wish I could use the GPU acceleration for stable diffusion. My Macbook has Radeon Pro 555 with Ventura OS. Help please :(

Python 3.11.1 (v3.11.1:a7a450f84a, Dec  6 2022, 15:24:06) [Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> if torch.backends.mps.is_available():
...     mps_device = torch.device("mps")
...     x = torch.ones(1, device=mps_device)
...     print (x)
... else:
...     print ("MPS device not found.")
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor.py", line 461, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 677, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 597, in _str_intern
    tensor_str = _tensor_str(self, indent)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 349, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/_tensor_str.py", line 137, in __init__
    nonzero_finite_vals = torch.masked_select(
                          ^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to create indexing library, error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:168:1: error: type 'const constant ulong3 *' is not valid for attribute 'buffer'
REGISTER_INDEX_OP_ALL_DTYPES(select);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:160:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
    REGISTER_INDEX_OP(8bit,  idx64, char,  INDEX_OP_TYPE, ulong3);    \
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:138:5: note: expanded from macro 'REGISTER_INDEX_OP'
    constant IDX_DTYPE   * offsets           [[buffer(3)]],                        \
    ^                                          ~~~~~~~~~
program_source:168:1: note: type 'ulong3' (vector of 3 'unsigned long' values) cannot be used in buffer pointee type
program_source:160:59: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
    REGISTER_INDEX_OP(8bit,  idx64, char,  INDEX_OP_TYPE, ulong3);    \
                                                          ^
program_source:168:1: error: explicit instantiation of 'index_select' does not refer to a function template, variable template, member function, member class, or static data member
REGISTER_INDEX_OP_ALL_DTYPES(select);
^
program_source:160:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
    REGISTER_INDEX_OP(8bit,  idx64, char,  INDEX_OP_TYPE, ulong3);    \
    ^
program_source:134:13: note: expanded from macro 'REGISTER_INDEX_OP'
kernel void index_ ## INDEX_OP_TYPE<DTYPE, IDX_DTYPE>(                             \
            ^
<scratch space>:9:1: note: expanded from here
index_select
^
program_source:20:13: note: candidate template ignored: substitution failure [with T = char, OffsetsT = unsigned long __attribute__((ext_vector_type(3)))]: type 'unsigned long const constant * __attribute__((ext_vector_type(3)))' is not valid for attribute 'buffer'
kernel void index_select(
            ^
program_source:168:1: error: type 'const constant ulong3 *' is not valid for attribute 'buffer'
REGISTER_INDEX_OP_ALL_DTYPES(select);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:162:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
    REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3);    \
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:138:5: note: expanded from macro 'REGISTER_INDEX_OP'
    constant IDX_DTYPE   * offsets           [[buffer(3)]],                        \
    ^                                          ~~~~~~~~~
program_source:168:1: note: type 'ulong3' (vector of 3 'unsigned long' values) cannot be used in buffer pointee type
program_source:162:59: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
    REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3);    \
                                                          ^
program_source:168:1: error: explicit instantiation of 'index_select' does not refer to a function template, variable template, member function, member class, or static data member
REGISTER_INDEX_OP_ALL_DTYPES(select);
^
program_source:162:5: note: expanded from macro 'REGISTER_INDEX_OP_ALL_DTYPES'
    REGISTER_INDEX_OP(16bit, idx64, short, INDEX_OP_TYPE, ulong3);    \
    ^
program_source:134:13: note: expanded from macro 'REGISTER_INDEX_OP'
kernel void index_ ## INDEX_OP_TYPE<DTYPE, IDX_DTYPE>(                             \
            ^
<scratch space>:17:1: note: expanded from here
index_select
^

....
...
program_source:248:13: note: candidate template ignored: substitution failure [with T = metal::_atomic<int, void>, E = int, OffsetsT = unsigned long __attribute__((ext_vector_type(3)))]: type 'unsigned long const constant * __attribute__((ext_vector_type(3)))' is not valid for attribute 'buffer'
kernel void index_put_accumulate_native_dtypes(
            ^
}
>>> 
>>>

I degrade Python form 3.12.1 to 3.11.1, and reinstall the latest version of Pytorch nightly, still no luck with the result.

Solution

I can replicate this on recent Nightly builds (notably,2.3.0.dev20240114). However, the latest stable release (Torch 2.1.2) works well.

Try to create a new environment with the stable release of Torch. The Apple documentation for MPS acceleration with PyTorch recommends the Nightly build because it used to be more experimental.

conda create -n torchstable python=3.8
conda activate torchstable
pip3 install torch torchvision torchaudio

Next, try to run your code

Update: This is confirmed as an issue on recent PyTorch nightly builds. See here and here.