Search code examples
pythontensorflow2.0normalizationtensorflow-datasetstf.data.dataset

Min-max normalization when using tf.data.Dataset


I have a tf.Dataset and I want to perform a minmax normalization, in order to have image values in the range [0,1].

I am interested in knowing how to perform normalization on each image, as well as for the whole batch.

@tf.function def load_images(imagePath):

label = tf.io.read_file(imagePath)
label = tf.image.decode_jpeg(label, channels=3)
label = tf.image.convert_image_dtype(label, dtype=tf.float32)

image=label+tf.random.normal(shape=tf.shape(label),mean=0,stddev=0.1**0.5)

return image, label

filenames = glob.glob("/content/mydrive/images/" + "*.jpg")

trainDS = tf.data.Dataset.from_tensor_slices(filenames) trainDS = (trainDS
.shuffle(len(filenames))
.map(load_images, num_parallel_calls=AUTOTUNE)
.batch(16)
.prefetch(AUTOTUNE) )

Could anyone suggest what is the best way to do that?

P.S. I would expect that a tf.image.per_image_normalization function existed (similar as tf.image.per_image_standardization) but no luck.


Solution

  • You can use tf.keras.layers.Rescaling(1./255) to perform a minmax normalization:

    import tensorflow as tf
    import pathlib
    
    dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
    data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
    data_dir = pathlib.Path(data_dir)
    
    batch_size = 32
    
    train_ds = tf.keras.utils.image_dataset_from_directory(
      data_dir,
      validation_split=0.2,
      subset="training",
      seed=123,
      batch_size=batch_size)
    
    normalization_layer = tf.keras.layers.Rescaling(1./255)
    normalized_ds = train_ds.map(lambda x, y: (normalization_layer(x), y))
    
    for x, y in normalized_ds.take(1):
      # the pixel values are now in [0,1]
      print(tf.reduce_min(x), tf.reduce_max(x))
    
    tf.Tensor(0.0, shape=(), dtype=float32) tf.Tensor(1.0, shape=(), dtype=float32)
    

    If you want to normalize each image separately just change the batch size to 1.