Search code examples
c++connxonnxruntime

How do you run a half float ONNX model using ONNXRuntime C API?


Since the C language doesn't have a half float implementation, how do you send data to the ONNXRuntime C API?


Solution

  • There's possibly an example you can follow linked from here: https://github.com/microsoft/onnxruntime/issues/1173#issuecomment-501088662

    You can create a buffer to write the input data to using CreateTensorAsOrtValue, and access the buffer within the OrtValue using GetTensorMutableData.

    ONNXRuntime is using Eigen to convert a float into the 16 bit value that you could write to that buffer.

    uint16_t floatToHalf(float f) {
      return Eigen::half_impl::float_to_half_rtne(f).x;
    }
    

    Alternatively you could edit the model to add a Cast node from float32 to float16 so that the model takes float32 as input.