uTensor model output does not equal to expected output

I'm currently working on a project where uTensor is required. uTensor seems to work correctly, however I ran into an issue which I (apparently) cannot fix myself.

Problem definition

I've created a simple python script which generates and saves a model to a file. This saved model can later be converted into C++ code by using uTensor-cli. The generated C++ code will be ran on a ARM dev board.

Everything runs fine, no errors. However, when I create a model like: "xW+b" the output of the model on the devboard always equals some static value, which is not equal to the output of the model from the python script.

The thing is, when a simple model like "W+b" is used (no input tensor used here) the output on the ARM devboard equals the output of the python script. And everything works as expected.

My findings

When using an input tensor (nothing large just a 1 dimensional array, like [1,0]) the ARM devboard always outputs some weird value compared to the output of the python script. When not using the input tensor, everything works as expected.

Other info

Because no wiki on uTensor exists yet, I've used a tutorial to learn about uTensor. The tutorial I've used can be found here: https://blog.hackster.io/simple-neural-network-on-mcus-a7cbd3dc108c The code I've written is based on the tutorial and does not include any cost/loss function and is not capable of 'learning' anything. The code is just for debugging.

The question

What is the reason an input tensor makes the application output unexpected values? How can I possibly fix this?

Code and output

Python script

import tensorflow as tf
import numpy as np
from tensorflow.python.framework import graph_util as gu

def weightVariable(shape, name):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial, name=name)

def createLayer(layerInput, inputSize, outputSize, layerNumber, dropout = -1):
    layerNumber = str(layerNumber)

    #Define weight and bias
    W_fc = weightVariable([inputSize, outputSize], 'W_fc' + layerNumber)

    #Formula Wx+b=y
    a_fc = tf.matmul(layerInput, W_fc, name='y_pred' + layerNumber)
    return a_fc

def saveGraph(saver, sess, y_pred):
    outNodes = [y_pred.op.name]
    subGraphDef = gu.remove_training_nodes(sess.graph_def)
    subGraphDef = gu.convert_variables_to_constants(sess, subGraphDef, outNodes)
    
    #Save the checkpoint
    ckptPath = saver.save(sess, "./chkps/model.ckpt")
    
    #Save the graph
    graphPath = tf.train.write_graph(subGraphDef, "./graph", "mlp.pb", as_text=False)
    
    #Print some usefull messages
    print("Saved checkpoint to: " + ckptPath)
    print("Saved graph to: " + graphPath)
    print("Output tensor: " + y_pred.op.name)
    
    def restoreGraph(saver, sess):
    tf.reset_default_graph()
    saver.restore(sess, "./chkps/model.ckpt")

def main():
    data = [
        [0,0],
        [0,1],
        [1,0],
        [1,1]
    ]
    labels = [
        [0],
        [1],
        [1],
        [0]
    ]
    inputSize = 2
    outputSize = 1

    #Placeholders for the input and output
    x_input = tf.placeholder(tf.float32, [None, inputSize], name='x_input')
    y_output = tf.placeholder(tf.float32, [None, outputSize], name='y_output')
    
    #Layers with relu activation
    inputLayer = createLayer(x_input, inputSize, outputSize, 0)
    
    #Start a session
    sess = tf.Session()
    saver = tf.train.Saver()
    
    #Run the session
    sess.run(tf.global_variables_initializer())
    feed_dict = {x_input: data, y_output: labels}
    sess.run(inputLayer, feed_dict=feed_dict)
    
    #Save the graph
    saveGraph(saver, sess, inputLayer)
    
    #Test the algorithm
    for i in range(0,4):
        testInput = [data[i]]
        output = sess.run(inputLayer, feed_dict={x_input: testInput})[0][0]
        print("Test output " + str(i) + ": " + str(output))
    
    #End the session
    sess.close()

#Execute the main function
main()

Output

Saved checkpoint to: ./chkps/model.ckpt
Saved graph to: ./graph/mlp.pb
Output tensor: y_pred0
Test output 0: 0.0
Test output 1: 0.0034507334
Test output 2: 0.07698402
Test output 3: 0.080434754

Converting to C++

utensor-cli convert graph/mlp.pb --output-nodes=y_pred0

C++ code

#include "models/mlp.hpp"
#include "tensor.hpp"
#include "mbed.h"
#include <stdio.h>

const int testData[4][2] = {{0,0},{0,1},{1,0},{1,1}};

Serial uart(USBTX, USBRX, 115200);

int main(void){
    printf("Compiled at: ");
    printf(__TIME__);
    printf("\n");
    
    for(int i = 0; i < 4; i++){
        //Create the context class. 
        Context ctx;
        Tensor *input_x = new WrappedRamTensor<int>({1, 2}, (int*) testData[i]);

        get_mlp_ctx(ctx, input_x);          //Pass the tensor to the context
        S_TENSOR pred_tensor = ctx.get("y_pred0:0");    //Get the output tensor
        ctx.eval();                 //Trigger the inference

        float prediction = *(pred_tensor->read<float>(0,0));    //Get the result
        printf("Test output %d: %f \r\n", i,  prediction);  //Print the result
        
    }

    printf("\n");
    return 0;
}

Serial output

Compiled at: (Compile time)
Test output 0: 0.000000 
Test output 1: 0.000000 
Test output 2: 0.000000 
Test output 3: 0.000000

Solution

Changing the datatype in the C++ code to float did the trick! I'm still not quite sure how I did not think about this.