Search code examples
pythonnlphuggingface-transformers

How to load a huggingface pretrained transformer model directly to GPU?


I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

would be loaded to CPU until executing

model.to('cuda')

now the model is loaded into GPU

I want to load the model directly into GPU when executing from_pretrained. Is this possible?


Solution

  • I'm answering my own question.

    Hugging Face accelerate (add via pip install accelerate) could be helpful in moving the model to GPU before it's fully loaded in CPU. It's useful when:

    GPU memory > model size > CPU memory

    Also specify device_map="cuda":

    from transformers import AutoModelForCausalLM
    
    model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map="cuda")