Search code examples
huggingface-datasetshuggingface

How do I save a Huggingface dataset?


How do I write a HuggingFace dataset to disk?

I have made my own HuggingFace dataset using a JSONL file:

Dataset({ features: ['id', 'text'], num_rows: 18 })

I would like to persist the dataset to disk.

Is there a preferred way to do this? Or, is the only option to use a general purpose library like joblib or pickle?


Solution

  • You can save a HuggingFace dataset to disk using the save_to_disk() method.

    For example:

    from datasets import load_dataset
      
    test_dataset = load_dataset("json", data_files="test.json", split="train")
    
    test_dataset.save_to_disk("test.hf")