I created two Python notebooks to fine-tune BERT on a Yelp review dataset for sentiment analysis. The only difference between the two notebooks is that one runs on a CPU .to("cpu")
while the other uses a GPU .to("cuda")
.
Despite this difference in hardware, the training times for both notebooks are nearly the same. I am new to using Hugging Face, so I'm wondering if there's anything I might be overlooking. Both notebooks are running on a machine with a single GPU.
TrainOutput(global_step=100, training_loss=1.5707407319545745, metrics={'train_runtime': 116.5447, 'train_samples_per_second': 3.432, 'train_steps_per_second': 0.858, 'total_flos': 105247256985600.0, 'train_loss': 1.5707407319545745, 'epoch': 0.4})
{'eval_loss': 1.4039757251739502,
'eval_accuracy': 0.4,
'eval_runtime': 3.6833,
'eval_samples_per_second': 27.15,
'eval_steps_per_second': 3.529,
'epoch': 0.4}
# specifically concerned with 'train_samples_per_second': 3.432
TrainOutput(global_step=100, training_loss=1.6277318179607392, metrics={'train_runtime': 115.46, 'train_samples_per_second': 3.464, 'train_steps_per_second': 0.866, 'total_flos': 105247256985600.0, 'train_loss': 1.6277318179607392, 'epoch': 0.4})
{'eval_loss': 1.525576114654541,
'eval_accuracy': 0.35,
'eval_runtime': 3.6518,
'eval_samples_per_second': 27.384,
'eval_steps_per_second': 3.56,
'epoch': 0.4}
# specifically concerned with 'train_samples_per_second': 3.464
I assume that the machine you were using had access to a GPU. The hf trainer will automatically use the GPU if it is available. It is irrelevant that you moved the model to cpu
or cuda
, the trainer will not check it and move your model to cuda
if available. You can turn off the device placement with the TrainingArguments setting no_cuda:
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir="./some_local_dir",
overwrite_output_dir=True,
per_device_train_batch_size=4,
dataloader_num_workers=2,
max_steps=100,
logging_steps=1,
evaluation_strategy="steps",
eval_steps=5,
no_cuda=True,
)