Search code examples
pythonnlphuggingface-transformers

How to load a huggingface pretrained transformer model directly to GPU?


I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

would be loaded to CPU until executing

model.to('cuda')

now the model is loaded into GPU

I want to load the model directly into GPU when executing from_pretrained. Is this possible?


Solution

  • I'm answering my own question

    huggingface accelerate could be helpful in moving the model to GPU before it's fully loaded in CPU, so it worked when
    GPU memory > model size > CPU memory
    by using device_map = 'cuda'

    !pip install accelerate
    

    then use

    from transformers import AutoModelForCausalLM
    model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map = 'cuda')