I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")
would be loaded to CPU until executing
model.to('cuda')
now the model is loaded into GPU
I want to load the model directly into GPU when executing from_pretrained
. Is this possible?
I'm answering my own question
huggingface accelerate could be helpful in moving the model to GPU before it's fully loaded in CPU, so it worked when
GPU memory > model size > CPU memory
by using device_map = 'cuda'
!pip install accelerate
then use
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map = 'cuda')