Can PyTorch GPU Use Shared GPU Memory (from RAM, shows in Windows Task Manage)?

Someone says that it can use, who verified it by running chatglm6b model on Windows with 1660s (6g). He claimed that it can't run under linux but he didn't test it. I don't use PyTorch. My CUDA experience told me that Nvidia doesn't take shared GPU memory. So it's either some magic created by Windows or both PyTorch and the Task Manager provided some wrong information. Can someone explain how could this work? Isn't it downgraded to CPU for slower processing?

Sample Code:

import torch
a=torch.nn.Linear(128,256)
a=a.to('cuda')
b=torch.rand((20000000,128))
b=b.to('cuda')
c=a.forward(b)

Dedicated GPU Memory shows 11.7/12GB on a 3080Ti, Shared GPU Memory shows 18.3GB/32GB.

Solution

For Windows 10 and 11 and newer operating systems, Microsoft introduced GPU shared memory, which uses 50% of physical memory for uniform addressing by default.

For CUDA, if you use Nvidia driver version 536 and newer versions under the above operating systems, then you can indeed use shared memory when you are low on memory. (This also applies to WSL). This behavior is not demoted to the CPU, but it can still significantly slow things down. This is because the actual operation passes through the PCIe bus and accesses the DDR memory. Here are my tests with Win11 23H2 and chatglm3 (PCIe3). You can see the Copy and individual memory usage.