Search code examples
pythonfunctiontensorflowkerasequivalent

Inequivalent output from tf.nn.conv2d and keras.layers.Conv2D


I've been reading the Hands-On Machine Learning textbook (2nd edition) by Aurélien Géron (textbook publisher webpage here). I've gotten into the content that applies CNNs to images. In the section titled Tensorflow Implementation of Chapter 14, they manually create filters that get passed to tf.nn.conv2d and applied to an image to produce a set of feature maps. After these manual filter examples, the book says:

in a real CNN you would normally define filters as trainable variables ... Instead of manually creating the variables, use the keras.layers.Conv2D layer.

The above quote implies to me that given identical inputs (and equivalent initializations), we should be able to derive identical outputs from tf.nn.conv2d and keras.layers.Conv2D. To validate this idea, I looked up whether the two functions were equivalent. According to this previously answered SO post, for convolution, the two functions are the same.

I set out to perform a simple test of their equivalence. I created a convolutional layer consisting of one feature map using a 7x7 filter (a.k.a: convolutional kernel) of all zeros that was implemented separately for tf.nn.conv2d and keras.layers.Conv2D. As expected, after summing all the pixel values in the difference of both images, this filter did cause the output images to have a value of zero for each pixel value. This difference of zero implies that the output images are identical.

I then decided to create the same 7x7 filter, but with all ones this time. Ideally, both functions should produce the same output, therefore the difference in the two output images should be zero. Unfortunately, when I check the difference in the output images (and sum the differences at each pixel), I get a nonzero sum value. Upon plotting the images and their difference, it is evident that they are not the same image (though they do look very similar at a glance).

After reading through the documentation for both functions, I believe that I am giving them equivalent inputs. What could I be doing/assuming incorrectly that is preventing both functions from producing identical outputs?

I have attached my code and versionining information below for reference. The code uses the scikit-learn china.jpg sample image as input and matplotlib.pyplot.imshow to help in visualizing the output images and their difference.

TF Version: 2.2.0-dev20200229

Keras Version: 2.3.1

Scikit-Learn Version: 0.22.1

Matplotlib Version: 3.1.3

Numpy Version: 1.18.1

from sklearn.datasets import load_sample_image
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
import numpy as np

# Get the feature map as a result of tf.nn.conv2d
def featureMap1(batch):
    
    # Extract the channels
    batch_size, height, width, channels = batch.shape

    # Make a (7,7,3,1) filter set (one set of a 7x7 filter per channel)
    # of just ones. 
    filters = np.ones(shape=(7, 7, channels, 1), dtype=np.float32)

    # Run the conv2d with stride of 1 (i.e: in.shape = out.shape)
    # Generate one feature map for this conv layer
    fmaps = tf.nn.conv2d(batch, filters,
                         strides=1, padding='SAME',
                         data_format='NHWC')
    
    # Return the feature map
    return fmaps

# Get the feature map as a result of keras.layers.Conv2D
def featureMap2(batch):

    # Create the input layer with the shape of the images
    inputLayer = keras.layers.Input(shape=batch.shape[1:])
    
    # Create the convLayer which should apply the filter of all ones
    convLayer = keras.layers.Conv2D(filters=1, kernel_size=7,
                                    strides=1, padding='SAME',
                                    kernel_initializer='ones',
                                    data_format='channels_last',
                                    activation='linear')

    # Create the ouput layer
    outputLayer = convLayer(inputLayer)

    # Set up the model
    model = keras.Model(inputs=inputLayer,
                        outputs=outputLayer)

    # Perform a prediction, no model fitting or compiling
    fmaps = model.predict(batch)

    return fmaps 

def main():

    # Get the image and scale the RGB values to [0, 1]
    china = load_sample_image('china.jpg') / 255

    # Build a batch of just one image
    batch = np.array([china])

    # Get the feature maps and extract
    # the images within them
    img1 = featureMap1(batch)[0, :, :, 0]
    img2 = featureMap2(batch)[0, :, :, 0]

    # Calculate the difference in the images
    # Ideally, this should be all zeros...
    diffImage = np.abs(img1 - img2)

    # Add up all the pixels in the diffImage,
    # we expect a value of 0 if the images are
    # identical
    print('Differences value: ', diffImage.sum())

    # Plot the images as a set of 4
    figsize = 10
    f, axarr = plt.subplots(2, 2, figsize=(figsize,figsize))

    axarr[0,0].set_title('Original Image')
    axarr[0,0].imshow(batch[0], cmap='gray')

    axarr[1,0].set_title('Conv2D through tf.nn.conv2d')
    axarr[1,0].imshow(img1, cmap='gray')
    
    axarr[1,1].set_title('Conv2D through keras.layers.Conv2D')
    axarr[1,1].imshow(img2, cmap='gray')

    axarr[0,1].set_title('Diff')
    axarr[0,1].imshow(diffImage, cmap='gray')
    
    plt.show()
    
    return


main()

Solution

  • The output of the two Convolutional Layers should be identical.

    You are comparing a Model to an Operation, whereas you should compare Operation (tf.keras.Conv2D) to Operation (tf.nn.conv2d).

    Modified the featureMap2 function.

    def featureMap2(batch):
        # Create the convLayer which should apply the filter of all ones
        convLayer = keras.layers.Conv2D(filters=1, kernel_size = 7,
                                        strides=1, padding='SAME',
                                        kernel_initializer='ones',
                                        data_format='channels_last',
                                        activation='linear')
        fmaps = convLayer(batch)
        return fmaps
    

    Here are the plots generated.

    conv2d_plots

    Here is the full modified code snippet execute in Google Colab Environment with added Seed just to ensure reproducibility and commented out previous code.

    %tensorflow_version 2.x
    
    from sklearn.datasets import load_sample_image
    import matplotlib.pyplot as plt
    import tensorflow as tf
    from tensorflow import keras
    import numpy as np
    
    tf.random.set_seed(26)
    np.random.seed(26)
    tf.keras.backend.set_floatx('float64')
    
    
    # Get the feature map as a result of tf.nn.conv2d
    def featureMap1(batch):
    
        # Extract the channels
        batch_size, height, width, channels = batch.shape
    
        # Make a (7,7,3,1) filter set (one set of a 7x7 filter per channel)
        # of just ones. 
        filters = np.ones(shape=(7, 7, channels, 1), dtype=np.float32)
    
        # Run the conv2d with stride of 1 (i.e: in.shape = out.shape)
        # Generate one feature map for this conv layer
        fmaps = tf.nn.conv2d(batch, filters,
                             strides=1, padding='SAME',
                             data_format='NHWC')
    
        # Return the feature map
        return fmaps
    
    # Get the feature map as a result of keras.layers.Conv2D
    def featureMap2(batch):
    
        # Create the convLayer which should apply the filter of all ones
        convLayer = keras.layers.Conv2D(filters=1, kernel_size = 7,
                                        strides=1, padding='SAME',
                                        kernel_initializer='ones',
                                        data_format='channels_last',
                                        activation='linear')
    
        fmaps = convLayer(batch)
    
        # Create the ouput layer
        # outputLayer = convLayer(inputLayer)
    
        # # Set up the model
        # model = keras.Model(inputs=inputLayer,
        #                     outputs=outputLayer)
    
        # Perform a prediction, no model fitting or compiling
        # fmaps = model.predict(batch)
    
        return fmaps 
    
    def main():
    
        # Get the image and scale the RGB values to [0, 1]
        china = load_sample_image('china.jpg') / 255
    
        # Build a batch of just one image
        batch = np.array([china])
    
        # Get the feature maps and extract
        # the images within them
        img1 = featureMap1(batch)[0, :, :, 0]
        img2 = featureMap2(batch)[0, :, :, 0]
        # Calculate the difference in the images
        # Ideally, this should be all zeros...
        diffImage = np.abs(img1 - img2)
    
        # Add up all the pixels in the diffImage,
        # we expect a value of 0 if the images are
        # identical
        print('Differences value: ', diffImage.sum())
    
        # Plot the images as a set of 4
        figsize = 10
        f, axarr = plt.subplots(2, 2, figsize=(figsize,figsize))
    
        axarr[0,0].set_title('Original Image')
        axarr[0,0].imshow(batch[0], cmap='gray')
    
        axarr[1,0].set_title('Conv2D through tf.nn.conv2d')
        axarr[1,0].imshow(img1, cmap='gray')
    
        axarr[1,1].set_title('Conv2D through keras.layers.Conv2D')
        axarr[1,1].imshow(img2, cmap='gray')
    
        axarr[0,1].set_title('Diff')
        axarr[0,1].imshow(diffImage, cmap='gray')
    
        plt.show()
    
        return
    
    
    main()
    
    

    EDIT:

    The main culprit was the Default Casting behavior of TensorFlow 2.x.

    WARNING:tensorflow:Layer conv2d is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.
    

    This reduces the accuracy of the computation due to precision loss from float64 to float32.
    You can avoid having this precision loss by setting the Tensorflow Keras Backend default floatx to float64.

    tf.keras.backend.set_floatx('float64')