Search code examples
c++pytorchlibtorch

How to translate or convert code from Python Pytorch to C++ Libtorch


I couldn't find the equivalent C++ calls in Libtorch (Pytorch C++ Frontend) for my Python Pytorch code.

The documentation according to my searchs (Pytorch Discuss) does not exist yet for my code. I wonder if someone can guide me with the following pieces (bellow).

I cutted pieces where I've been having more crashes (wrong usage) of Libtorch C++.

import torch as th

th.set_grad_enabled(False)
...
X = th.zeros((nobs, 3+p), device=dev, dtype=th.float32)
y = th.tensor(indata, device=dev, dtype=th.float32)
diffilter = th.tensor([-1., 1.], device=dev, dtype=th.float32).view(1, 1, 2)
dy = th.conv1d(y.view(1, 1, -1), diffilter).view(-1)
z = dy[p:].clone()
...
# X matrix
X[:, 0] = 1 
X[:, 1] = th.arange(p+1, n) 
X[:, 2] = y[p:-1]
...
# master X 
Xm = th.zeros((nobsadf, 3+p), device=th.device('cpu'), dtype=th.float32)
...
# batch matrix, vector and observations
Xbt = th.zeros(batch_size, adfs_count, nobsadf, (3+p), device=th.device('cpu'), dtype=th.float32)
...
t = 0 # start line for master main X OLS matrix/ z vector
for i in range(nbatchs):
    for j in range(batch_size): # assembly batch_size matrixes
        Xm[:] = X[t:t+nobsadf] 
        ...
        Xbt[j, :, :, :] = Xm.repeat(adfs_count, 1).view(adfs_count, nobsadf, (3+p))            
        for k in range(adfs_count): 
            Xbt[j, k, :k, :] = 0
            nobt[j, k] = float(nobsadf-k-(p+3))

Solution

  • After suffering a lot!...

    I learned to better use the Pytorch Discuss forum for Pytorch and Libtorch information. Using the tag C++ for example.

    Unfortunantly, there is the oficial source of information (altough quite messy). This is the reason why I am sharing my answer here in SO.

    namespace th = torch;
    ...
    // th.set_grad_enabled(False)
    th::NoGradGuard guard; // or same as with torch.no_grad(): block 
    ...
    auto dtype_option = th::TensorOptions().dtype(th::kFloat32);
    //X = th.zeros((nobs, 3+p), device=dev, dtype=th.float32)
    //y = th.tensor(indata, device=dev, dtype=th.float32)
    //diffilter = th.tensor([-1., 1.], device=dev, dtype=th.float32).view(1, 1, 2)
    //dy = th.conv1d(y.view(1, 1, -1), diffilter).view(-1)
    //z = dy[p:].clone()
    auto X = th::zeros({nobs, 3+p}, dtype_option);
    auto y = th::from_blob(signal, {n}, dtype_option);
    auto diffilter = th::tensor({-1, 1}, dtype_option).view({ 1, 1, 2 }); // first difference filter
    auto dy = th::conv1d(y.view({ 1, 1, -1 }), diffilter).view({ -1 });
    auto z = dy.slice(0, p).clone();
    ...
    // X[:, 0] = 1 # drift
    // X[:, 1] = th.arange(p+1, n) 
    // X[:, 2] = y[p:-1]
    // create acessors to fill in the matrix
    auto ay = y.accessor<float, 1>(); // <1> dimension
    auto aX = X.accessor<float, 2>(); // <2> dimension
    for (auto i = 0; i < nobs; i++) {
        aX[i][0] = 1; 
        aX[i][1] = p + 1 + i; 
        aX[i][2] = ay[p+i];  
    }
    ...
    // Xm = th.zeros((nobsadf, 3+p), device=th.device('cpu'), dtype=th.float32)
    auto Xm = th::zeros({ nobsadf, 3 + p }, dtype_option.device(th::Device(th::kCPU)));
    // Xbt = th.zeros(batch_size, adfs_count, nobsadf, (3+p), device=th.device('cpu'), dtype=th.float32)
    auto Xbt = th::zeros({ batch_size, adfs_count, nobsadf, (3 + p) }, dtype_option.device(th::Device(th::kCPU)));
    ...
    // this acessor will be used in the inner for loop k
    auto anobt = nobt.accessor<float, 2>();
    auto tline = 0; // start line for master main X OLS matrix/ z vector
    for (int i = 0; i < nbatchs; i++){
        for (int j = 0; j < batch_size; j++){ // assembly batch_size matrixes
            // Xm[:] = X[t:t+nobsadf]
            Xm.copy_(X.narrow(0, tline, nobsadf)); 
            ... 
            // Xbt[j, :, :, :] = Xm.repeat(adfs_count, 1).view(adfs_count, nobsadf, (3+p))   
            auto Xbts = Xbt.select(0, j);
            Xbts.copy_(Xm.repeat({ adfs_count, 1 }).view({ adfs_count, nobsadf, (3 + p) }));
            for (int k = 0; k < adfs_count; k++) { 
                // Xbt[j, k, :k, :] = 0
                // nobt[j][k] = float(nobsadf - k - (p + 3));
                Xbts.select(0, k).narrow(0, 0, k).fill_(0);
                anobt[j][k] = float(nobsadf - k - (p + 3));
            }
            tline++;
        }
    }
          
    

    Probably there is a better or faster way of coding but the code above fully works. Fell free to make suggestions to improve my code.

    C++ Signatures of commom functions above

    Tensor Tensor::slice(int64_t dim, int64_t start, int64_t end, int64_t step)
    
    Tensor Tensor::narrow(int64_t dim, int64_t start, int64_t length) 
    
    Tensor Tensor::select(int64_t dim, int64_t index)
    
    Tensor & Tensor::copy_(const Tensor & src, bool non_blocking=false)
    

    Further notes:

    Almost all C++ function have Pytorch Python equivalent. So here is my golden tip:

    Translate your python script using C++ equivalent functions like copy_, narrow, slice testing it (to make sure it works) than just go to C++ replicating everything.