How cv::MAT convert to NCHW format?

In User Guide.html, Input/output of tensorRT is need to use NCHW format.
What's NCHW fomat ?
How can I convert cv::MAT to NCHW format?

I run inference using TensorRT like below code.
Nothing error. But, It's not right result of output.

int batchSize = 1;
int size_of_single_input = 256 * 256 * 3 * sizeof(float);
int size_of_single_output = 100 * 1 * 1 * sizeof(float); 

IBuilder* builder = createInferBuilder(gLogger);

INetworkDefinition* network = builder->createNetwork();

CaffeParser parser;
auto blob_name_to_tensor = parser.parse(“deploy.prototxt”,
                                        "sample.caffemodel",
                                        *network,
                                        DataType::kFLOAT);

network->markOutput(*blob_name_to_tensor->find("prob"));

builder->setMaxBatchSize(1);
builder->setMaxWorkspaceSize(1 << 30); 
ICudaEngine* engine = builder->buildCudaEngine(*network);

IExecutionContext *context = engine->createExecutionContext();

int inputIndex = engine->getBindingIndex(INPUT_LAYER_NAME),
int outputIndex = engine->getBindingIndex(OUTPUT_LAYER_NAME);

cv::Mat input;
input = imread("./sample.jpg");
cvtColor(input, input, CV_BGR2RGB);
cv::resize(input, input, cv::Size(256, 256));

float output[OUTPUTSIZE];

void* buffers = malloc(engine->getNbBindings() * sizeof(void*));
cudaMalloc(&buffers[inputIndex], batchSize * size_of_single_input);
cudaMalloc(&buffers[outputIndex], batchSize * size_of_single_output);

cudaStream_t stream;
cudaStreamCreate(&stream);

cudaMemcpyAsync(buffers[inputIndex], (float *)input, 
                batchSize * size_of_single_input, 
                cudaMemcpyHostToDevice, stream);

context.enqueue(batchSize, buffers, stream, nullptr);


cudaMemcpyAsync(output, buffers[outputIndex], 
                batchSize * size_of_single_output, 
                cudaMemcpyDeviceToHost, stream));

cudaStreamSynchronize(stream);

Solution

NCHW: For a 3 channel image, say BGR, pixels of the B channel are stored first, then the G channel and finally the R channel.

NHWC: For each pixel, its 3 colors are stored together in BGR order.

TensorRT requires your image data to be in NCHW order. But OpenCV reads it in NHWC order. You can write a simple function to read the data from NHWC to a buffer where you store them in NCHW order. Copy this buffer to device memory and pass to TensorRT.

You can find an example of this operation in the samples/sampleFasterRCNN/sampleFasterRCNN.cpp file in your TensorRT installation. It reads a PPM file, which is also in NHWC order and then converts it to NCHW order and subtracts the mean values, both in a single step. You can modify that to suit your purpose.