I am getting the "tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU." warning while training my LSTM model on Apple Silicon M2. The training is just too slow. How can I get the best out of this chip for my task?
PS: (1) I've already installed the tensorflow-macos
and tensorflow-metal
packages alongside the tensorflow-deps
package provided in the Apple channel of Conda.
(2) My model is not the deepest one either as it consists of one LSTM layer with 64 units, and one dense layer with 64 units.
(3) My machine's main specifications:
Since you're using Apple Silicon, cuDNN most likely isn't the culprit here.
Try training on the CPU and compare the time cost. Your model isn't large, so the overhead of dispatching work to the GPU should be the leading cause here. As your model gets larger, the overhead tends to get amortized. See the Troubleshooting section on this page.