Search code examples
tensorflowkerasapple-m1metalapple-silicon

TensorFlow: Why is the training of an RNN too slow on Apple Silicon M2?


I am getting the "tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU." warning while training my LSTM model on Apple Silicon M2. The training is just too slow. How can I get the best out of this chip for my task?

PS: (1) I've already installed the tensorflow-macos and tensorflow-metal packages alongside the tensorflow-deps package provided in the Apple channel of Conda.

(2) My model is not the deepest one either as it consists of one LSTM layer with 64 units, and one dense layer with 64 units.

(3) My machine's main specifications:

  • macOS v13.2.1 (Ventura) (the latest stable one)
  • Apple Silicon M2 (8-core CPU, 10-core GPU, and 16-core neural engine)
  • 16 GB unified memory

Solution

  • Since you're using Apple Silicon, cuDNN most likely isn't the culprit here.

    Try training on the CPU and compare the time cost. Your model isn't large, so the overhead of dispatching work to the GPU should be the leading cause here. As your model gets larger, the overhead tends to get amortized. See the Troubleshooting section on this page.