Search code examples
pythonhuggingface-transformerswandb

Wandb website for Huggingface Trainer shows plots and logs only for the first model


I am finetuning multiple models using for loop as follows.

for file in os.listdir(args.data_dir):
    finetune(args, file)

BUT wandb website shows plots and logs only for the first file i.e., file1 in data_dir although it is training and saving models for other files. It feels very strange behavior.

wandb: Synced bertweet-base-finetuned-file1: https://wandb.ai/***/huggingface/runs/***

This is a small snippet of finetuning code with Huggingface:

def finetune(args, file):
    training_args = TrainingArguments(
        output_dir=f'{model_name}-finetuned-{file}',
        overwrite_output_dir=True,
        evaluation_strategy='no',
        num_train_epochs=args.epochs,
        learning_rate=args.lr,
        weight_decay=args.decay,
        per_device_train_batch_size=args.batch_size,
        per_device_eval_batch_size=args.batch_size,
        fp16=True, # mixed-precision training to boost speed
        save_strategy='no',
        seed=args.seed,
        dataloader_num_workers=4,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset['train'],
        eval_dataset=None,
        data_collator=data_collator,
    )
    trainer.train()
    trainer.save_model()

Solution

  • wandb.init(reinit=True) and run.finish() helped me to log the models separately on wandb website.

    The working code looks like below:

    
    for file in os.listdir(args.data_dir):
        finetune(args, file)
    
    import wandb
    def finetune(args, file):
        run = wandb.init(reinit=True)
        ...
        run.finish()
    

    Reference: https://docs.wandb.ai/guides/track/launch#how-do-i-launch-multiple-runs-from-one-script