Search code examples
c++neural-networkpytorchtensortorchscript

How to efficiently (without looping) get data from tensor predicted by a torchscript in C++?


I am calling a torchscript (neural network serialized from Python) from a C++ program:

  // define inputs
  int batch = 3; // batch size
  int n_inp = 2; // number of inputs
  double I[batch][n_inp] = {{1.0, 1.0}, {2.0, 3.0}, {4.0, 5.0}}; // some random input
  std::cout << "inputs" "\n";  // print inputs
  for (int i = 0; i < batch; ++i)
  {    
    std::cout << "\n";
    for (int j = 0; j < n_inp; ++j)
    {
      std::cout << I[i][j] << "\n";
    }
  }
  
  // prepare inputs for feeding to neural network
  std::vector<torch::jit::IValue> inputs;
  inputs.push_back(torch::from_blob(I, {batch, n_inp}, at::kDouble));

  // deserialize and load scriptmodule
  torch::jit::script::Module module;
  module = torch::jit::load("Net-0.pt");

  // do forward pass
  auto outputs = module.forward(inputs).toTensor();

Usually, to get data from the outputs, the following (element-wise) operation is performed:

  // get data from outputs
  std::cout << "outputs" << "\n";
  int n_out = 1;
  double outputs_data[batch][n_out];
  for (int i = 0; i < batch; i++) 
  {
    for (int j = 0; j < n_out; j++)
    {
      outputs_data[i][j] = outputs[i][j].item<double>();
      std::cout << outputs_data[i][j] << "\n";
    }
  }

However, such looping using .item is highly inefficient (in the actual code I will have millions of points predicted at each time step). I want to get data from outputs directly (without looping over elements). I tried:

  int n_out = 1;
  double outputs_data[batch][n_out];
  outputs_data = outputs.data_ptr<double>();

However, it is giving the error:

error: incompatible types in assignment of ‘double*’ to ‘double [batch][n_out]’
   outputs_data = outputs.data_ptr<double>();
                                           ^

Note, that type of outputs_data is fixed to double and cannot be changed.


Solution

  • It is necessary to make a deep copy as follows:

    double outputs_data[batch];
    std::memcpy(outputs_data, outputs.data_ptr<dfloat>(), sizeof(double)*batch);