I'm using pytorch for the first time and I'm facing a problem I don't think I should. I currently have selected 2919 frames of a movie in jpg. I'm trying to transform all those images into a single tensor. I'm using CLIP to transform each image into a tensor of size [1, 512]. In the end, I expected to have a tensor of size [2919, 512], which should not use that much memory. But my code never finishes running and I can only assume I'm doing something terribly wrong.
First I'm doing my import and loading the model:
import torch
import clip
from glob import glob
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
Secondly, I'm reading the path for all the images and initializing the "film" tensor with random value to overwrite them. I tried generating an empty one and concatenate but that also consumed too much memory:
path_names = glob(r"Films/**/*.jpg")
film = torch.rand((len(files), 512), dtype=torch.float32, device = device)
film_frame_count = 0
for file in files:
print("Frame " + str(film_frame_count) + " out of " + str(len(files)))
film[film_frame_count] = model.encode_image(preprocess(Image.open(file)).unsqueeze(0).to(device))[0]
film_frame_count += 1
torch.save(film, 'output_tensor/'+ film_code[1])
If anyone could point it out what I'm doing wrong I would appreciate.
The problem ended up being caused because pytorch was saving the gradiants for the graph, so I needed to indicate that I didn't want them to be stored with this indicator on top:
with torch.no_grad():
/*my code*/