TensorFlowLite tensor shapes make no sense (input: [1, 1, 1, 3])

I trained an object detection model using this step-by-step tutorial that detects one class: person. It works pretty well too, Finds people of all shapes and sizes and does not mark my easel as person any more. Works great with some normal python tf inference runner I found somewhere.

Now I want to use it on my phone. So I used the default model converter tutorial to do that, just to realize that apparently there's something called a "StridedSlice" operator in my model that can't be converted to tflite, so I used the select tensorflow model operators tutorial to convert it with this one operator included and added the required library to the app, which is basically the default example app coming with detect.tflite:

implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:2.8.0'

I tried with detect.tflite to make sure this doesn't break anything and it still works like a charm.

The problem is: In tflite's opinion, my model takes a [1, 1, 1, 3] tensor as input now, as confirmed by Netron (which I found thanks to Alex K.'s answer):

So it seems that indeed for some reason my model takes a [1, 1, 1, 3] input tensor and provides several output tensors with shapes I don't understand and sadly without any description of what the outputs mean. Also I can't force it do anything, instead it crashes and, as expected by a properly statically typed language:

Cannot copy to a TensorFlowLite tensor (serving_default_input_tensor:0) with 3 bytes from a Java Buffer with 1228800 bytes.

because of course my buffer's size is not (1 * 1 * 1 * 3) because that doesn't make any sense. Instead my buffer uses an input size of 640, but not because I'm sure that's the correct size but because I guessed and also my pipeline.config has this.

image_resizer {
  fixed_shape_resizer {
    height: 640
    width: 640
  }
}

For some reason though, when using the normal python tf inference runner, I can just throw any image in there (same script works on both my laptop's integrated camera and a realsense 3d cam, the laptop one of course has shitty resolution).

So where did I go wrong? The conversion to tflite where it apparently lost information about inputs? Usage in the app where maybe I have to tell it to not listen to the so-called [1, 1, 1, 3] input shape? Somewhere else entirely?

Solution

I was facing with this issue as I followed the step-by-step tutorial too. After investigating, I managed to change the input to [1,320,320,4].

To solve this issue, in the exporting step of step-by-step tutorial, you must change the code from :

python .\exporter_main_v2.py --input_type image_tensor --pipeline_config_path .\models\my_ssd_resnet50_v1_fpn\pipeline.config --trained_checkpoint_dir .\models\my_ssd_resnet50_v1_fpn\ --output_directory .\exported-models\my_model

to:

python .\export_tflite_graph_tf2.py --pipeline_config_path .\models\my_ssd_resnet50_v1_fpn\pipeline.config --trained_checkpoint_dir .\models\my_ssd_resnet50_v1_fpn\ --output_directory .\exported-models\my_model

Note: the export_tflite_graph_tf2 is located in tensorflow\models\research\object_detection

After that, you may export this saved model.

You can validate the saved model input by using the below scripts in python:

import tensorflow as tf

interpreter = tf.lite.Interpreter([Your tflite Model])
print(interpreter.get_input_details())

"""expected result: [{'name': 'serving_default_input:0', 'index': 0, 'shape': array([  1, 320, 320,   3]), 'shape_signature': array([  1, 320, 320,   3]), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]"""