I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")
would be loaded to CPU until executing
model.to('cuda')
now the model is loaded into GPU
I want to load the model directly into GPU when executing from_pretrained
. Is this possible?
I'm answering my own question.
Hugging Face accelerate
(add via pip install accelerate
) could be helpful in moving the model to GPU before it's fully loaded in CPU. It's useful when:
GPU memory > model size > CPU memory
Also specify device_map="cuda"
:
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map="cuda")