Search code examples
c++deep-learningtensorrtinference-engine

How to correctly format input and resize output data whille using TensorRT engine?


I'm trying implementing deep learning model into TensorRT runtime. The model conversion step is done quite OK and i'm pretty sure about it.

Now there's 2 parts i'm currently struggle with is memCpy data from host To Device (like openCV to Trt) and get the right output shape in order to get the right data. So my questions is:

  • How actually a shape of input dims relate with memory buffer. What is the difference when the model input dims is NCHW and NHWC, so when i read a openCV image, it's NHWC and also the model input is NHWC, do i have to re-arange the buffer data, if Yes then what's the actual consecutive memory format i have to do ?. Or simply what does the format or sequence of data that the engine are expecting ?

  • About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..).. Eg. an array or something look similar like when working with python .

I read Nvidia docs and it's not beginner-friendly at all.

//Let's say i have a model thats have a dynamic shape input dim in the NHWC format. 
auto input_dims = nvinfer1::Dims4{1, 386, 342, 3};  //Using fixed H, W for testing
context->setBindingDimensions(input_idx, input_dims);
auto input_size = getMemorySize(input_dims, sizeof(float));
// How do i format openCV Mat to this kind of dims and if i encounter new input dim format, how do i adapt to that ???

And the expected output dims is something like (1,32,53,8) for example, the output buffer result in a pointer and i don't know what's the sequence of the data to reconstruct to expected array shape.

// Run TensorRT inference
void* bindings[] = {input_mem, output_mem};
bool status = context->enqueueV2(bindings, stream, nullptr);
if (!status)
{
    std::cout << "[ERROR] TensorRT inference failed" << std::endl;
    return false;
}

auto output_buffer = std::unique_ptr<int>{new int[output_size]};
if (cudaMemcpyAsync(output_buffer.get(), output_mem, output_size, cudaMemcpyDeviceToHost, stream) != cudaSuccess)
{
    std::cout << "ERROR: CUDA memory copy of output failed, size = " << output_size << " bytes" << std::endl;
    return false;
}
cudaStreamSynchronize(stream);

//How do i use this output_buffer to form right shape of output, (1,32,53,8) in this case ?

Solution

  • Could you please edit your question and tell us which model you're using if it's a commonly known NN, prehaps one we can download to test locally?

    Then, the answer since it doesn't depend on the model (even though it would help to answer)

    How actually a shape of input dims relate with memory buffer

    If the input is NxCxHxW, you need to allocate N*C*H*W*sizeof(float) memory for that on your CPU and GPU. To be more precise, you need to allocate space on GPU for all the bindings and on CPU for only input and output bindings.

    when i read a openCV image, it's NHWC and also the model input is NHWC, do i have to re-arange the buffer data

    No, you do not have to re-arrange the buffer data. If you would have to change between NHWC and NCHW you can check this or google 'opencv NHWC to NHCW'.

    Full working code example here, especially this function.

    Or simply what does the format or sequence of data that the engine are expecting ?

    This depends on how the neural network was trained. You should in general know exactly which kind of preprocessing and image data formats have been used to train the NN. You should even use the same libraries to load images and process them if possible. It's an open problem in ML: if you try to replicate results of some papers and use their models but they haven't open sourced the preprocessing you might get worse results. In the "worst" case you can implement both NHCW and NCHW and test which of them works.

    About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..).. Eg. an array or something look similar like when working with python .

    This question clearly requires me to understand which NNs you are referring to. But I myself do the following:

    Then I know the size of the input binding or bindings if there are many inputs, and the size of the output binding or bindings if there are many outputs.

    This way you know the right result shape for each task. I hope this answered your question. If not, please add detailed comments and edit your post to be more precise. Thank you.

    I read Nvidia docs and it's not beginner-friendly at all.

    Yes I agree. You're better of searching TensorRT c++ (or Python) repositories from Github and studying their code. Have you seen TensorRT samples? It doesn't really take many lines of code to implement TensorRT inference.