Search code examples
pythondictionarydatasettokenizehuggingface

Add new column to a HuggingFace dataset inside a dictionary


I have a tokenized dataset titled, tokenized_datasets as follows:

enter image description here

I want to add a column titled ['labels'] that is a copy of ['input_ids'] within the features. I'm aware of the following method from this post Add new column to a HuggingFace dataset:

new_dataset = dataset.add_column("labels", tokenized_datasets['input_ids'].copy())

But I first need to access the Dataset Dictionary. This is what I have so far but it doesn't seem to do the trick:

def new_column(example):
    example["labels"] = example["input_ids"].copy()
    return example

dataset_new = tokenized_datasets.map(new_column)

KeyError: 'input_ids'

Solution

  • Try one of the two options below:

    # first option
    def new_column(example):
    return {"labels" = example["input_ids"]}
    
    # second option
    def new_column(example):
        example["labels"] = example["input_ids"]
        return example
    
    dataset_new = tokenized_datasets.map(new_column)