Search code examples
pythonhuggingface-datasets

List all available dataset-names contained in a hugginface datasets dataset


I want to know which datasets are included in e.g. this collection of huggingface datasets: https://huggingface.co/datasets/autogluon/chronos_datasets

"m4_daily" and "weatherbench_daily" are mentioned explicitly, but there should be more.

I am not interested in a list of all such collections.

I get the list through the error message in case I leave the name parameter unspecified:

ds = datasets.load_dataset("autogluon/chronos_datasets",  split="train")  # error with list
# ds = datasets.load_dataset("autogluon/chronos_datasets", "m4_daily"  split="train") # no error

How do retrieve the list of names propperly?


Solution

  • Following function should do the work.

    from datasets import get_dataset_config_names
    
    config_names = get_dataset_config_names("<org/dataset>")
    

    You can check examples at https://huggingface.co/docs/datasets/en/load_hub#configurations