python tensorflow tensorflow2.0 object-detection tensor

Input tensor incompatible with Python signature in Tensorflow object detection

I recently trained a object detection model in Tensorflow but for some reason some of the images have input tensors that are incompatible with the python signature. This is the code I'm running in google colab for inference:

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')   # Suppress Matplotlib warnings

def load_image_into_numpy_array(path):
    """Load an image from file into a numpy array.

    Puts image into numpy array to feed into tensorflow graph.
    Note that by convention we put it into a numpy array with shape
    (height, width, channels), where channels=3 for RGB.

    Args:
      path: the file path to the image

    Returns:
      uint8 numpy array with shape (img_height, img_width, 3)
    """
    return np.array(Image.open(path))

for image_path in img:

    print('Running inference for {}... '.format(image_path), end='')
    image_np=load_image_into_numpy_array(image_path)


    # Things to try:
    # Flip horizontally
    # image_np = np.fliplr(image_np).copy()
    # Convert image to grayscale
    # image_np = np.tile(
    #     np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)

    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor=tf.convert_to_tensor(image_np)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor=input_tensor[tf.newaxis, ...]

    # input_tensor = np.expand_dims(image_np, 0)
    detections=detect_fn(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections=int(detections.pop('num_detections'))
    detections={key:value[0,:num_detections].numpy()
                   for key,value in detections.items()}
    detections['num_detections']=num_detections

    # detection_classes should be ints.
    detections['detection_classes']=detections['detection_classes'].astype(np.int64)

    image_np_with_detections=image_np.copy()

    viz_utils.visualize_boxes_and_labels_on_image_array(
          image_np_with_detections,
          detections['detection_boxes'],
          detections['detection_classes'],
          detections['detection_scores'],
          category_index,
          use_normalized_coordinates=True,
          max_boxes_to_draw=100,     #max number of bounding boxes in the image
          min_score_thresh=.25,      #min prediction threshold
          agnostic_mode=False)
    %matplotlib inline
    plt.figure()
    plt.imshow(image_np_with_detections)
    print('Done')
    plt.show()

And this is the error message I get when running inference:

    Running inference for /content/gdrive/MyDrive/TensorFlow/workspace/training_demo/images/test/image_part_002.png... 
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-23-5b465e5474df> in <module>()
         40 
         41     # input_tensor = np.expand_dims(image_np, 0)
    ---> 42     detections=detect_fn(input_tensor)
         43 
         44     # All outputs are batches tensors.
    
    6 frames
    /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py in _convert_inputs_to_signature(inputs, input_signature, flat_input_signature)
       2804       flatten_inputs)):
       2805     raise ValueError("Python inputs incompatible with input_signature:\n%s" %
    -> 2806                      format_error_message(inputs, input_signature))
       2807 
       2808   if need_packing:
    
    ValueError: Python inputs incompatible with input_signature:
      inputs: (
        tf.Tensor(
    [[[[  0   0   0 255]
       [  0   0   0 255]
       [  0   0   0 255]
       ...
       [  0   0   0 255]
       [  0   0   0 255]
       [  0   0   0 255]]
    
      [[  0   0   0 255]
       [  0   0   0 255]
       [  0   0   0 255]
       ...
       [  0   0   0 255]
       [  0   0   0 255]
       [  0   0   0 255]]
    
      [[  0   0   0 255]
       [  0   0   0 255]
       [  0   0   0 255]
       ...
       [  0   0   0 255]
       [  0   0   0 255]
       [  0   0   0 255]]
    
      ...
    
      [[ 34  32  34 255]
       [ 35  33  35 255]
       [ 35  33  35 255]
       ...
       [ 41  38  38 255]
       [ 40  37  37 255]
       [ 40  37  37 255]]
    
      [[ 36  34  36 255]
       [ 35  33  35 255]
       [ 36  34  36 255]
       ...
       [ 41  38  38 255]
       [ 41  38  38 255]
       [ 43  40  40 255]]
    
      [[ 36  34  36 255]
       [ 36  34  36 255]
       [ 37  35  37 255]
       ...
       [ 41  38  38 255]
       [ 40  37  37 255]
       [ 39  36  36 255]]]], shape=(1, 1219, 1920, 4), dtype=uint8))
      input_signature: (
        TensorSpec(shape=(1, None, None, 3), dtype=tf.uint8, name='input_tensor'))

Does anyone know a way I could convert the input tensors of my images so I can run inference on them? I know for example one image where the inference works gas resolution 400x291 and the image where inference doesn't work has resolution 1920x1219. I used the SSD MobileNet V1 FPN 640x640 Model for my training.

Solution

The problem in your case is that your input tensor shape is of the form (1,1219,1920,4), more precisely the 4 is problematic.

The first element, 1, stands for the batch size (added in input_tensor[tf.newaxis, ...]).

You get that part right, but where you actually read the images, the problem takes place, because there are 4 channels (assuming you read RGB-A?) not 3 (typical RGB) or 1 (grayscale).

I recommend that you check your images and to force the conversion to RGB, i.e. Image.open(path).convert('RGB')