Search code examples
machine-learningtensorflow2.0tensorflow-litetpugoogle-coral

TensorFlow Lite model on Coral Dev Board not running on TPU


I have a TensorFlow Lite model and a Coral Dev Board, and I want to perform inference on the Dev Board's TPU.

When initialising the TensorFlow Lite interpreter in my Python inference script, I add "libedgetpu.so.1" as an experimental delegate, following the example in the Google Coral TFLite Python example (linked to in the getting started guide for the Coral Dev Board), however inference is exactly the same speed as when I don't specify the TPU experimental delegate, so I'm assuming that inference is still running on the Dev Board's CPU. Inference time on the Dev Board (with and without the experimental delegate) is 32s; on my desktop PC, inference time for the same test-set is 10s if I run the TFLite model on CPU, and 1.3s if I run the same model in Keras before converting to TFLite (I assume this is faster than TFLite because it utilises multiple cores).

My question: How can I make inference run on the Dev Board's TPU instead of the CPU?

I wonder if this is something I need to specify while building the Keras model on my PC before converting to TFLite format (EG using a with tf.device context manager or something which makes the resulting TFLite model use the TPU), but I can't see anything about this in the TensorFlow Lite Converter Python API documentation.

The Dev Board is running Mendel version 2.0, Python version 3.5.3, tflite-runtime version 2.1.0.post1 (I know I should update the Mendel version, however I'm currently using a Windows PC, and it will be a pain to get access to a Linux machine, or to try and update the Dev Board from Windows using Putty, VirtualBox or WSL. If only Coral supported Windows, like Raspberry Pi does...).

Below is my inference script (I can also upload training script and model if necessary; dataset is MNIST, converted to NumPy float data as described in this Gist):

import numpy as np
from time import perf_counter
try:
    # Try importing the small tflite_runtime module (this runs on the Dev Board)
    print("Trying to import tensorflow lite runtime...")
    from tflite_runtime.interpreter import Interpreter, load_delegate
    experimental_delegates=[load_delegate('libedgetpu.so.1.0')]
except ModuleNotFoundError:
    # Try importing the full tensorflow module (this runs on PC)
    try:
        print("TFLite runtime not found; trying to import full tensorflow...")
        import tensorflow as tf
        Interpreter = tf.lite.Interpreter
        experimental_delegates = None
    except ModuleNotFoundError:
        # Couldn't import either module
        raise RuntimeError("Could not import Tensorflow or Tensorflow Lite")

# Load data
mnist_file = np.load("data/mnist.npz")
x_test = mnist_file["x_test"]
y_test = mnist_file["y_test"]
x_test = x_test.astype(np.float32)

# Initialise the interpreter
tfl_filename = "lstm_mnist_model_b10000.tflite"
interpreter = Interpreter(model_path=tfl_filename,
    experimental_delegates=experimental_delegates)
interpreter.allocate_tensors()

print("Starting evaluation...")
for _ in range(3):
    input_index = (interpreter.get_input_details()[0]['index'])
    output_index = (interpreter.get_output_details()[0]['index'])
    # Perform inference
    t0 = perf_counter()
    interpreter.set_tensor(input_index, x_test)
    interpreter.invoke()
    result = interpreter.get_tensor(output_index)
    t1 = perf_counter()
    # Print accuracy and speed
    num_correct = (result.argmax(axis=1) == y_test).sum()
    print("Time taken (TFLite) = {:.4f} s".format(t1 - t0))
    print('TensorFlow Lite Evaluation accuracy = {} %'.format(
        100 * num_correct / len(x_test)))
    # Reset interpreter state (I don't know why this should be necessary, but
    # accuracy suffers without it)
    interpreter.reset_all_variables()

Solution

  • it looks like you've already asked this question on our github page and was answered here. Just wanted to share for others to reference