Search code examples
deep-learningcomputer-visionconv-neural-networkartificial-intelligence

Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 300, 300, 3), found shape=(300, 300, 3)


I'm new in machine learning and deep learning. I'm working on a project to create saliency map for some testing images. I have loaded some cats and dogs images and preprocessed them. I've created my model. after running this code snippet I faced this error: **Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 300, 300, 3), found shape=(300, 300, 3) **

def do_salience(image, model, label, prefix):
  '''
  Generates the saliency map of a given image.

  Args:
    image (file) -- picture that the model will classify
    model (keras Model) -- your cats and dogs classifier
    label (int) -- ground truth label of the image
    prefix (string) -- prefix to add to the filename of the saliency map
  '''

  # Read the image and convert channel order from BGR to RGB
  # YOUR CODE HERE
  image = cv2.imread(image)
  image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) 
  # Resize the image to 300 x 300 and normalize pixel values to the range [0, 1]
  # YOUR CODE HERE
  image = cv2.resize(image,(300,300))/255.0

  # Add an additional dimension (for the batch), and save this in a new variable
  # YOUR CODE HERE
  tensor_image = tf.expand_dims(image, axis=0)

  # Declare the number of classes
  # YOUR CODE HERE
  num_classes= 2

  # Define the expected output array by one-hot encoding the label
  # The length of the array is equal to the number of classes
  # YOUR CODE HERE
  class_index= [0,1]
  expected_output = tf.one_hot([class_index] * tensor_image.shape[0], num_classes)
   # Witin the GradientTape block:
  # Cast the image as a tf.float32
  # Use the tape to watch the float32 image
  # Get the model's prediction by passing in the float32 image
  # Compute an appropriate loss
  # between the expected output and model predictions.
  # you may want to print the predictions to see if the probabilities adds up to 1
  # YOUR CODE HERE
  with tf.GradientTape() as tape:
    inputs = tf.cast(image,tf.float32)
    tape.watch(inputs)
    predictions = model(inputs)
    loss = tf.keras.losses.CategoricalCrossentropy(expected_output, predictions)
    print(predictions)
  # get the gradients of the loss with respect to the model's input image
  # YOUR CODE HERE
  gradients = tape.gradients(loss, inputs)
  
  # generate the grayscale tensor
  # YOUR CODE HERE
  grayscale_tensor = tf.reduce_sum(tf.abs(gradients), axis=-1)

  # normalize the pixel values to be in the range [0, 255].
  # the max value in the grayscale tensor will be pushed to 255.
  # the min value will be pushed to 0.
  # Use the formula: 255 * (x - min) / (max - min)
  # Use tf.reduce_max, tf.reduce_min
  # Cast the tensor as a tf.uint8
  # YOUR CODE HERE
  normalized_tensor = tf.cast(255*(grayscale_tensor-tf.reduce_min(grayscale_tensor))
  /(tf.reduce_max(grayscale_tensor)-tf.reduce_min(grayscale_tensor)),tf.unit8,
                             )
    
  # Remove dimensions that are size 1
  # YOUR CODE HERE
  normalized_tensor = tf.squeeze(normalized_tensor)
    
  # plot the normalized tensor
  # Set the figure size to 8 by 8
  # do not display the axis
  # use the 'gray' colormap
  # This code is provided for you.
  plt.figure(figsize=(8, 8))
  plt.axis('off')
  plt.imshow(normalized_tensor, cmap='gray')
  plt.show()

  # optional: superimpose the saliency map with the original image, then display it.
  # we encourage you to do this to visualize your results better
  # YOUR CODE HERE
  gradient_color = cv2.applyColorMap(normalized_tensor.numpy(),cv2.COLORMAP_HOT)
  gradient_color = gradient_color/255.0
  super_imposed = cv2.addWeighted(image, 0.5, gradient_color, 0.5, 0.0)
  plt.figure(figsize=(8, 8))
  plt.imshow(super_imposed)
  plt.axis('off')
  plt.show()
  # save the normalized tensor image to a file. this is already provided for you.
  salient_image_name = prefix + image
  normalized_tensor = tf.expand_dims(normalized_tensor, -1)
  normalized_tensor = tf.io.encode_jpeg(normalized_tensor, quality=100, format='grayscale')
  writer = tf.io.write_file(salient_image_name, normalized_tensor)
# load initial weights
model.load_weights('0_epochs.h5')

# generate the saliency maps for the 5 test images
# YOUR CODE HERE
do_salience('cat1.jpg', model, 0, "salient")
do_salience('cat2.jpg', model, 0, "salient")
do_salience('catanddog.jpg', model, 0, "salient")
do_salience('dog1.jpg', model, 1, "salient")
do_salience('dog2.jpg', model, 1, "salient")

I want to see the sailency maps by running do_salience() but my code doesnt work and I don't know what means this error. I would appriciate it if you help me to fix it.


Solution

  • Normally in DL/ML, we like to batch inputs to consider several at a time when training or performing inference. Tensorflow is built for this and provides a flexible dimensionality for the first dimension of your batched data, which it refers to as None, which is why it looks for data of dimension None, 300, 300, 3 (each data point you are using is an array of pixels with dimensions 300, 300, 3). You're doing inference on just one image at a time, so you want to prepare the data as a batch with a single element, which you do in the line

    tensor_image = tf.expand_dims(image, axis=0).

    However, you use image later in your code, instead of tensor_image, e.g.

    inputs = tf.cast(image,tf.float32).

    Try using tensor_image here instead, that is

    inputs = tf.cast(tensor_image,tf.float32)