arrays tensorflow shapes google-coral yolov5

Output tensor from tflite interpreter is squeezed

I'm trying to get a YOLOv5s model to run on a Coral EdgeTPU. Ive followed the instructions in the YOLOv5 repository for conversion from the yolov5s.pt model to the yolov5s-int8_edgetpu.tflite model.

After cloning the pycoral repository, they provide a detect_image.py script. When using their model, the script executes with no errors.

If I run the same script with my yolov5s-int8_edgetpu.tflite model I get this error:

  File "examples/detect_image.py", line 108, in <module>
    main()
  File "examples/detect_image.py", line 87, in main
    objs = detect.get_objects(interpreter, args.threshold, scale)
  File "/usr/lib/python3/dist-packages/pycoral/adapters/detect.py", line 214, in get_objects
    elif common.output_tensor(interpreter, 3).size == 1:
  File "/usr/lib/python3/dist-packages/pycoral/adapters/common.py", line 29, in output_tensor
    return interpreter.tensor(interpreter.get_output_details()[i]['index'])()
IndexError: list index out of range

Inference occurs without any issues, but the post processing of the data is where I have hit a snag. The reason for this error is that the shape of the output tensor is not compatible with the script provided by pycoral. They are expecting something of shape [4x6300x85], while mine is of shape [1x25200x85].

Input details for yolov5s-int8_edgetpu.tflite:

{'name': 'serving_default_input_1:0', 'index': 547, 'shape': array([  1, 640, 640,   3], dtype=int32), 'shape_signature': array([  1, 640, 640,   3], dtype=int32), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.003921568859368563, 0), 'quantization_parameters': {'scales': array([0.00392157], dtype=float32), 'zero_points': array([0], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Input details for EfficientDetLite2 model(downloaded from TFhub):

{'name': 'serving_default_images:0', 'index': 0, 'shape': array([  1, 448, 448,   3], dtype=int32), 'shape_signature': array([  1, 448, 448,   3], dtype=int32), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.0078125, 127), 'quantization_parameters': {'scales': array([0.0078125], dtype=float32), 'zero_points': array([127], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Output details for yolov5s-int8_edgetpu.tflite:

{'name': 'StatefulPartitionedCall:0', 'index': 548, 'shape': array([    1, 25200,    85], dtype=int32), 'shape_signature': array([    1, 25200,    85], dtype=int32), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.004499140195548534, 1), 'quantization_parameters': {'scales': array([0.00449914], dtype=float32), 'zero_points': array([1], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

Output details for an EfficientDetLite2 model(downloaded from TFhub):

{'name': 'StatefulPartitionedCall:3', 'index': 782, 'shape': array([ 1, 25,  4], dtype=int32), 'shape_signature': array([ 1, 25,  4], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, 

{'name': 'StatefulPartitionedCall:2', 'index': 783, 'shape': array([ 1, 25], dtype=int32), 'shape_signature': array([ 1, 25], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, 

{'name': 'StatefulPartitionedCall:1', 'index': 784, 'shape': array([ 1, 25], dtype=int32), 'shape_signature': array([ 1, 25], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}, 

{'name': 'StatefulPartitionedCall:0', 'index': 785, 'shape': array([1], dtype=int32), 'shape_signature': array([1], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

The EfficientDet model has 4 output tensors, each one representing bounding boxes, class_ids, scores, and count respectively.

The Yolov5s model seems to just squish all of these into the same tensor with no differentiation.

I thought that maybe the error resides in the conversion of the model, but it also might just be that Yolov5 models are meant to squish all of their output tensors into a single one.

If anyone has experienced this or has suggestions on how to proceed id appreciate it.

Solution

Since the Yolov5s model has a different input file than the EfficientDet, the output tensor will be different. The trick here is understanding how to process this output tensor.

Fortunately, Ultralytics/Yolov5 held an export competition where the goal was to execute Yolov5 models on EdgeTPU devices.

This guy Josh won the coral devboard section. He wrote python library to process these wonky tensor outputs from Yolov5s models. Here is the repo. The real processing of the output tensor is done in his non-max-suppression code.

I've forked his repo and added the ability to execute/process these Yolov5s models on desktops.

Thanks so much Josh!