How do I write a HuggingFace dataset to disk?
I have made my own HuggingFace dataset using a JSONL file:
Dataset({ features: ['id', 'text'], num_rows: 18 })
I would like to persist the dataset to disk.
Is there a preferred way to do this? Or, is the only option to use a general purpose library like joblib or pickle?
You can save a HuggingFace dataset to disk using the save_to_disk()
method.
For example:
from datasets import load_dataset
test_dataset = load_dataset("json", data_files="test.json", split="train")
test_dataset.save_to_disk("test.hf")