python pytorch google-colaboratory stable-diffusion

OutOfMemoryError: CUDA out of memory. Both in local machine and google colab

I am trying to use stable diffusion xl model to generate images. But after installing and painfully matching version of python, pytorch, diffusers, cuda versions I got this error:

OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacty of 14.75 GiB of which 857.06 MiB is free. Process 43684 has 13.91 GiB memory in use. Of the allocated memory 13.18 GiB is allocated by PyTorch, and 602.64 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.

Now it may seem obvious to get higher GPU Memory but!!! I have tried this on my local computer with NVIDIA GEFORCE FTX 3060 6GB. And also in Google Colab with 15 GB of VRAM!

I have tried every solution in stackoverflow, github and still can't fix this issue. Solutions I have tried:

I am not training the model here. When training batch_size was 1.
Added these environment variables: PYTHONUNBUFFERED=1;PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256
Resize image to 512x512
I have read somewhere that I need to downgrade pytorch version to 1.8 because of RTX 3060 GPU and Cuda version 11.3. But can't install pytorch version 1.8 : Could not find a version that satisfies the requirement torch==1.8.1

Here is my python code:


from diffusers import DiffusionPipeline, StableDiffusionXLImg2ImgPipeline
import torch
import gc

#for cleaning memory
gc.collect()
del variables
torch.cuda.empty_cache()

model = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(
    model,
    torch_dtype=torch.float16,
)
pipe.to("cuda")
pipe.load_lora_weights("model/", weight_name="pytorch_lora_weights.safetensors")

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16,
)
refiner.to("cuda")


prompt = "a portrait of maha person 4k, uhd"

for seed in range(1):
    generator = torch.Generator("cuda").manual_seed(seed)
    image = pipe(prompt=prompt, generator=generator, num_inference_steps=25)
    image = image.images[0]
    image.save(f"output_images/{seed}.png")
    image = refiner(prompt=prompt, generator=generator, image=image)
    image = image.images[0]
    image.save(f"images_refined/{seed}.png")

Solution

I commented refiner and decreased num_inference_steps=10 and it worked. But image quality was low. So need to optimize code playing around with these parameters.