There are clearly tensors allocated in my GPU memory. When I turn PYTORCH_NO_CUDA_MEMORY_CACHING enviroment variable back to 0 it works seemingly fine. Is this a bug?
I've read pytorch documentation on memory management but I still don't understand.
Found an answer on the Pytorch forum:
Disabling the caching allocator is a debugging feature and some utils won’t work, such as CUDA Graphs. You could suggest a fix in case you are interested to see the used memory stats.