Search code examples
c#androidtensorflowml-agenttensorflowsharp

Tensorflowsharp results getvalue() is very slow


I am using TensorflowSharp to run evaluations using a neural network on an Android phone. I am building the project with Unity.

I am using the tensorflowsharp unity plugin listed under the requirements here: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Using-TensorFlow-Sharp-in-Unity.md.

Everything is working, however extracting the result is very slow.

The network I am running is an autoencoder and the output is an image with dimensions of 128x128x16 (yes there is a lot of output channels).

The evaluation is done in ~ 0.2 seconds which is acceptable. However when i need to extract the result data using results[0].GetValue() it is VERY slow.

This is my code where i run the neural network

var runner = session.GetRunner();
runner.AddInput(graph[INPUT_NAME][0], tensor).Fetch(graph[OUTPUT_NAME][0]);
var results = runner.Run();

float[,,,] heatmaps = results[0].GetValue() as float[,,,]; // <- this is SLOW

The problem: The last line where i convert the result to floats is taking ~1.2 seconds.

Can it realy be true that reading the result data into a float array is taking more than 5 times as long as the actual evaluation of the network?

Is there another way to extract the result values?


Solution

  • So I have found a solution to this. I still do not know why the GetValue() call is so slow, but I found another way to retrieve the data.

    I chose to manually read the raw tensor data available at results[0].Data

    I created a small function to handle this as a drop in for GetValue, (Here just with the dimensions i am expecting hardcoded)

        private float[,,,] TensorToFLoats(TFTensor tensor)
        {
    
            IntPtr resData = tensor.Data;
            UIntPtr dataSize = tensor.TensorByteSize;
    
            byte[] s_ImageBuffer = new byte[(int)dataSize];
            System.Runtime.InteropServices.Marshal.Copy(resData, s_ImageBuffer, 0, (int)dataSize);
            int floatsLength = s_ImageBuffer.Length / 4;
            float[] floats = new float[floatsLength];
            for (int n = 0; n < s_ImageBuffer.Length; n += 4)
            {
                floats[n / 4] = BitConverter.ToSingle(s_ImageBuffer, n);
            }
            float[,,,] result = new float[1, 128, 128, 16];
    
    
            int i = 0;
            for (int y = 0; y < 128; y++)
            {
                for (int x = 0; x < 128; x++)
                {
                    for (int p = 0; p < 16; p++)
                    {
                        result[0, y, x, p] = floats[i++];
                    }
                }
            }
            return result;
        }
    

    Given this i can replace the code in my question with the following

    var runner = session.GetRunner();
    runner.AddInput(graph[INPUT_NAME][0], tensor).Fetch(graph[OUTPUT_NAME][0]);
    var results = runner.Run();
    
    float[,,,] heatmaps = TensorToFLoats(results[0]);
    

    This is insanely much faster. Where GetValue took ~1 second the TensorToFloats function i created got the same data in ~0.02 seconds