How to load a huggingface pretrained transformer model directly to GPU?

I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

would be loaded to CPU until executing

model.to('cuda')

now the model is loaded into GPU

I want to load the model directly into GPU when executing from_pretrained. Is this possible?

Solution

I'm answering my own question.

Hugging Face accelerate (add via pip install accelerate) could be helpful in moving the model to GPU before it's fully loaded in CPU. It's useful when:

GPU memory > model size > CPU memory

Also specify device_map="cuda":

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map="cuda")

Unexpected list append
Force matrix_world to be recalculated in Blender
SQLAlchemy and empty columns
ValueError: time data '24:00' does not match format '%H:%M'
Convert RDD of LabeledPoint to DataFrame toDF() Error
How to cancel trigonometric expressions in SymPy
Get view used in Django tests
Precompiled sasl python 3.9+ package for windows
Regex: Substitute pattern in string multiple times without leftovers
How to render raw html in the PyHTML library
Why does my implementation of trilateration give wrong results?
Django admin: how to sort by one of the custom list_display fields that has no database field
TypeError: not all arguments converted during string formatting - psycopg2
Is there a Python equivalent of the C# null-coalescing operator?
Kraken API - Account balances request returning Invalid Nonce
configparser without whitespace surrounding operator
Pytorch tensor to numpy array
Django: How to get a person whose birthday is today from a database?
Performance impact of inheriting from many classes
How can I do a line break (line continuation) in Python (split up a long line of source code)?
Using pydantic to change int to string
Breaking long method chains into multiple lines in Python
What do ** (double star/asterisk) and * (star/asterisk) mean in a function call?
How to install Pygame on Python 3.4?
Rotating values in a list [Python]
Launch default image viewer from pygtk program
what's the inverse of the quantile function on a pandas Series?
How can I install packages using pip according to the requirements.txt file from a local directory?
Python generate all n-permutations of n lists
FastAPI error when handling file together with form-data defined in a Pydantic model