Normalize tf.data.Dataset

I have a tf.data.Dataset of images with input shape (batch-size, 128, 128, 2) and target shape (batch-size, 128, 128, 1) where the inputs are 2-channel images (complex-valued images with two channels representing real and imaginary part) and the targets are 1-channel images (real-valued images). I need to normalize the input and target images by first removing their mean image from them and then scaling them to (0,1) range. If I am not wrong, tf.data.Dataset can work with only one batch at a time, not the entire dataset. So I am removing the mean image of the batch from each image in the batch in the remove_mean py_function and then scaling each image to (0,1) by subtracting its minimum value and dividing by the difference of its maximum and minimum values in py_function linear_scaling. But after printing min and max values in an input image from the dataset before and after applying the functions, there is no change in the image values. Could anyone suggest what may be going wrong in this?

def remove_mean(image, target):
    image_mean = np.mean(image, axis=0)
    target_mean = np.mean(target, axis=0)
    image = image - image_mean
    target = target - target_mean
    return image, target

def linear_scaling(image, target):
    image_min = np.ndarray.min(image, axis=(1,2), keepdims=True)
    image_max = np.ndarray.max(image, axis=(1,2), keepdims=True)
    image = (image-image_min)/(image_max-image_min)

    target_min = np.ndarray.min(target, axis=(1,2), keepdims=True)
    target_max = np.ndarray.max(target, axis=(1,2), keepdims=True)
    target = (target-target_min)/(target_max-target_min)
    return image, target

a, b = next(iter(train_dataset))
print(tf.math.reduce_min(a[0,:,:,:]))

train_dataset.map(lambda item1, item2: tuple(tf.py_function(remove_mean, [item1, item2], [tf.float32, tf.float32])))
test_dataset.map(lambda item1, item2: tuple(tf.py_function(remove_mean, [item1, item2], [tf.float32, tf.float32])))

a, b = next(iter(train_dataset))
print(tf.math.reduce_min(a[0,:,:,:]))

train_dataset.map(lambda item1, item2: tuple(tf.py_function(linear_scaling, [item1, item2], [tf.float32])))
test_dataset.map(lambda item1, item2: tuple(tf.py_function(linear_scaling, [item1, item2], [tf.float32])))

a, b = next(iter(train_dataset))
print(tf.math.reduce_min(a[0,:,:,:]))


Output -

tf.Tensor(-0.00040511801, shape=(), dtype=float32)
tf.Tensor(-0.00040511801, shape=(), dtype=float32)
tf.Tensor(-0.00040511801, shape=(), dtype=float32)

Solution

map is not an inplace operation, so your train_dataset doesn't change when you do train_dataset.map(....).

Do train_dataset = train_dataset.map(...)