python tensorflow keras automatic-mixed-precision

Float16 mixed precision being slower than regular float32, keras, tensorflow 2.0

I am using Tensorflow 2.10 in windows with a NVIDIA RTX 2060 SUPER (with tensor cores) for deep learning. But when enabling mixed precision of float16 the time per epoch actually becomes slower than faster.

Code:

import tensorflow as tf
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

(train_x, train_y), (test_x, test_y) = tf.keras.datasets.cifar100.load_data()

tf.keras.mixed_precision.set_global_policy("mixed_float16")

model = tf.keras.Sequential([
    
    tf.keras.layers.Lambda(lambda x : x / 255, input_shape=(32,32,3)),
    tf.keras.layers.Conv2D(filters=64, kernel_size=(4,4)),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Conv2D(filters=32, kernel_size=(2,2)),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dense(4096, activation="relu"),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(100),
    tf.keras.layers.Activation("softmax", dtype="float32")
    ])

model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])

print("compute dtype of first layer: ", model.layers[0].compute_dtype)

model.fit(train_x, train_y, epochs=100, batch_size=1020)

model.evaluate(test_x, test_y)

I put some images of the problem: Here's an image of without using mixed precision , And here's an image using mixed precision, more slow

Running the code in Google Colab that uses a more modern version of tensorflow (TF 2.15) does work well, and is faster with mixed precision than without it (as it should be). Here's the link to the colab: Google Colab

I'm not an expert using tensorflow and I have been trying to fix this error for weeks, some help would be appreciated. Thanks!

Other Information:

I'm using cuDNN version 8.1.1 and Cuda 11.2, that are technically the compatible versions.

Solution

The solution I found is switch to Ubuntu (Linux) and update to the newer Tensorflow 2.15.

In this version mixed precision (float16) is twice of fast compared to the classic float32.

I also upgrade from Cuda 11.2 to 12.2 and from Cudnn 8.1.1 to 8.9