python pytorch databricks huggingface peft

PyTorch: AttributeError: 'torch.dtype' object has no attribute 'itemsize'

I am trying to follow this article on medium Article.

I had a few problems with it so the remain chang eI did was to the TrainingArguments object I added gradient_checkpointing_kwargs={'use_reentrant':False},.

So now I have the following objects:

peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=100, #1000
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={'use_reentrant':False},
    report_to="none",
    overwrite_output_dir = 'True',
    group_by_length=True,
)

peft_model.config.use_cache = False

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

And when I call peft_trainer.train() I get the following error:

AttributeError: 'torch.dtype' object has no attribute 'itemsize'

I'm using Databricks, and my pytorch version is 2.0.1+cu118

Solution

I was able to recreate your problem on Databricks with the following cluster:

Runtime: 14.1 ML (includes Apache Spark 3.5.0, GPU, Scala 2.12)
Worker Type: Standard_NC16as_T4_v3 / Standard_NC6s_vs
Driver Type: Standard_NC16as_T4_v3 / Standard_NC6s_vs

And then building on top of all the answers here already I was able to overcome your problem by the following:

Upgrade your transformers library via: !pip install -–upgrade git+https://github.com/huggingface/transformers
Upgrading your torch version via: !pip install -–upgrade torch torchvision
Upgrading your accelerate version via: !pip install -–upgrade accelerate
Using a specific version of the datasets library via: !pip install datasets==2.16.0

I'm not sure if it matters but the order I used of the commands above are: 4 >> 1 >> 3 >> 2

This makes your problem go away and works on both transformers.Trainer and also SFTTrainer that I saw in your article imported but never used.