I have a dataset below and would like to delete features From A - F the dataset are converted from python dataframe
dataset = datasets.DatasetDict({"train":Dataset.from_pandas(X_train),
"test":Dataset.from_pandas(X_test),
"val":Dataset.from_pandas(X_val),
})
The dataset output like below
DatasetDict({
train: Dataset({
features: ['A', 'B', 'C', 'D', 'E', 'F', 'text', '__index_level_0__', 'label'],
num_rows: 1173
})
test: Dataset({
features: ['A', 'B', 'C', 'D', 'E', 'F', 'text', '__index_level_0__', 'label'],
num_rows: 1369
})
val: Dataset({
features: ['A', 'B', 'C', 'D', 'E', 'F', 'text', '__index_level_0__', 'label'],
num_rows: 1369
})
})
Result like below
DatasetDict({
train: Dataset({
features: ['text', '__index_level_0__', 'label'],
num_rows: 1173
})
test: Dataset({
features: ['text', '__index_level_0__', 'label'],
num_rows: 1369
})
val: Dataset({
features: ['text', '__index_level_0__', 'label'],
num_rows: 1369
})
})
What you need is the remove_columns()
method from datasets. This works on any Dataset()
object, if you want to remove some columns at this level and not in Pandas before.
dataset = dataset.remove_columns("label")
For your case, it would be:
dataset = dataset.remove_columns(['A', 'B', 'C', 'D', 'E', 'F'])
You can have a look here: https://huggingface.co/docs/datasets/process