Truncate SVD decomposition of Pytorch tensor without transfering to cpu

I'm training a model in Pytorch and I want to use truncated SVD decomposition of input. For calculating SVD I transfer input witch is a Pytorch Cuda Tensor to CPU and using TruncatedSVD from scikit-learn perform truncate, after that, I transfer the result back to GPU. The following is code for my model:

 class ImgEmb(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(ImgEmb, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.drop = nn.Dropout(0.2)
        self.mlp = nn.Linear(input_size/2, hidden_size)
        self.relu = nn.Tanh()
        self.svd = TruncatedSVD(n_components=input_size/2)

    def forward(self, input):
        svd=self.svd.fit_transform(input.cpu())
        svd_tensor=torch.from_numpy(svd)
        svd_tensor=svd_tensor.cuda()
        mlp=self.mlp(svd_tensor)
        res = self.relu(mlp)
        return res

I wonder is a way to implement truncated SVD without transferring back and forth to GPU? (Because it's very time consuming and is not efficient at all)

Solution

You could directly use PyTorch's SVD and truncate it manually, or you can use the truncated SVD from TensorLy, with the PyTorch backend:

import tensorly as tl
tl.set_backend('pytorch')

U, S, V = tl.truncated_svd(matrix, n_eigenvecs=10)

However, the GPU SVD does not scale very well on large matrices. You can also use TensorLy's partial svd which will still copy your input to CPU but will be much faster if you keep only a few eigenvalues as it will use a sparse eigendecomposition. In Scikit-learn's truncated SVD, you can also use 'algorithm = arpack' to use Scipy's sparse SVD which again might be faster if you only need a few components.