RuntimeError when calling `allocate_tensors()` on converted tflite model

I followed a great tutorial on deploying a TensorFlow model using TF-Lite and everything works. However, when I try to use my own model (converted from saved keras model) I get the following error when calling the allocate_tensors() method:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-73-6b4d64de8090> in <module>
      1 #interpreter = tflite.Interpreter(model_path='model.tflite')
      2 interpreter = tflite.Interpreter(model_path=lite_model_location)
----> 3 interpreter.allocate_tensors()

~/pyenv/srcnn/lib/python3.6/site-packages/tflite_runtime/interpreter.py in allocate_tensors(self)
    257   def allocate_tensors(self):
    258     self._ensure_safe()
--> 259     return self._interpreter.AllocateTensors()
    260 
    261   def _safe_to_run(self):

RuntimeError: external/org_tensorflow/tensorflow/lite/core/subgraph.cc BytesRequired number of elements overflowed.
Node number 0 (CONV_2D) failed to prepare.

I believe it has to do with the way I've converted my model, but none of the options described in the tf.lite.TFLiteConverter are helping.

The tflite model I'm trying to load can be found here, which is a converted version of the saved keras model found here.

The model from the tutorial works without issue. I've noticed differences in the input details between the tflite versions of these models. For example, the tutorial model (working):

{'name': 'input',
 'index': 88, 
 'shape': array([  1, 224, 224,   3], dtype=int32),
 'shape_signature': array([  1, 224, 224,   3], dtype=int32),
 'dtype': <class 'numpy.uint8'>, 
 'quantization': (0.0078125, 128), 
 'quantization_parameters': {'scales': array([0.0078125], dtype=float32),
 'zero_points': array([128], dtype=int32),
 'quantized_dimension': 0},
 'sparsity_parameters': {}}

While the input details for my non-working tflite model are:

{'name': 'input_1',
 'index': 0,
 'shape': array([1, 1, 1, 3], dtype=int32),
 'shape_signature': array([-1, -1, -1,  3], dtype=int32),
 'dtype': <class 'numpy.float32'>,
 'quantization': (0.0, 0),
 'quantization_parameters': {'scales': array([], dtype=float32),
 'zero_points': array([], dtype=int32),
 'quantized_dimension': 0}, 
 'sparsity_parameters': {}}

Could it be something with the conversion? The model worked fine in development using keras, and should be able to accept inputs of variable x- and y-dimensions (image sizes). I don't think dtypes are the issues here since uint8 and float32 should both be supported according to the documentation.

Solution

Ok, pretty easy fix it turns out. When using a CNN with unknown input dimensions (i.e. -1 in the shape_signature here, caused by setting -1 in the input layer) the unknown dimensions in the input tensor are set to 1. To get the model to allocate properly when using a model like this, you have to do 2 things:

Manually set the shape of the input tensor to be the shape of the input data, e.g. interpreter.resize_tensor_input(0, [1, input_shape[0], input_shape[1], 3], strict=True).
Manually set the dtype of the input data to match that of the model's input layer, seen in the 'dtype' entry in the input details.

It seems this is done automatically in regular TensorFlow, but the model must be prepared like this in the Lite version.

Edit: Regarding setting the dtype of the input data, this is done in the casting to numpy.array after it is imported from an image format, before calling allocate_tensors(). You can see the difference between the TF implementation (line 332) and the TFLite implmentation (line 77).