Search code examples
pythonmachine-learningdeep-learningpytorch

How does the pytorch CUDA version and CUDA actually works?


Here is my conda environment's package list:

MY PACKAGE LIST

As you can see, I have not installed cudatoolkit and the nvcc comand is not usable. But I do have installed pytorch in CUDA version.

However, when I import torch in python and check torch.cuda.is_available(), it returns Ture.

I even run this test script:

import torch
from torch import nn
from torch.nn import Module
from torch.optim.lr_scheduler import LambdaLR


class TestNet(Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = nn.Linear(10,10)

    def forward(self, x):
        return self.linear(x)
    

if __name__=="__main__":
    if torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"
    print(f"Using device {device}")
    test_samples = torch.rand([32,10]).to(device)
    gt_matrix = torch.eye(10).to(device)
    target = torch.matmul(test_samples, gt_matrix)

    model = TestNet().to(device)

    optimizer = torch.optim.SGD(model.parameters(), lr=1)
    criterion = nn.MSELoss()
    scheduler = LambdaLR(optimizer, lr_lambda=lambda x: min(x, 24)/24)

    for epoch in range(128):
        logits = model(test_samples)
        loss = criterion(logits, target)
        learning_rate = optimizer.param_groups[0]["lr"]

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        scheduler.step()

        print(f"Epoch {epoch+1}/{24}, loss {loss.item()}, lr {learning_rate}")
    
    print("Learned matrix:")
    print(model.state_dict()["linear.weight"])

And it runs successfully.

So I am curious about how pytorch CUDA version actually works? Does it need a pre-installed CUDA toolkit or not? Besides, what is the difference between installing CUDA by conda install cudatoolkit , conda install cuda and even installing by graphical installer?


Solution

  • According to this SO thread and this forum thread, you are not required to have nvcc installed on your local machine since PyTorch is shipped with its own CUDA library. The only requirement is that you have a CUDA driver installed on your device and it supports the CUDA version you installed via Pytorch.

    I would assume that conda install cudatoolkit installs a standalone CUDA toolkit, but is independent of PyTorch. Following the installation page, you should instead use conda install pytorch::pytorch-cuda. That way you install PyTorch with CUDA support.