Search code examples
pythonvoice-recognitionhuggingface-datasets

Loading data using hugging_face load_dataset from Common Voice is giving an error


I am working on a voice dataset using the Facebook Hugging Face_ transformer, but I am unable to load data from the Common Voice forum:

from datasets import load_dataset, load_metric
common_voice_train = load_dataset("common_voice", "id", split="train+validation")
common_voice_test = load_dataset("common_voice", "id", split="test")

It gives the following error:

Couldn't find file locally at common_voice/common_voice.py, or remotely at https://raw.githubusercontent.com/huggingface/datasets/1.4.1/datasets/common_voice/common_voice.py.

The file was picked from the master branch on github instead at https://raw.githubusercontent.com/huggingface/datasets/master/datasets/common_voice/common_voice.py.

How can I fix this problem?


Solution

  • You are using the Hugging Face lightweight datasets library to load the Common Voice repository dataset. The id parameter must be replaced with the builder configuration parameter, for instance, if you want to load the English dataset from the Common Voice corpus, the builder configuration parameter is en.

    You can check the parameter on the Common Voice repository. It is prefixed where the version is mentioned.