Strangest error I've encountered, copied straight from hugging face website to start learning audio classifiers:
from datasets import load_dataset, Audio, Dataset
minds = load_dataset("PolyAI/minds14", name="en-US", split="train")
generates the following error:
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
I've tried using Dataset.cleanup_cache_files
but that did not help. Why is this error so vague? Any ideas on how to resolve this?
In case it may help, here's the full traceback:
Generating train split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\audio.py", line 91, in encode_example
import soundfile as sf # soundfile is a dependency of librosa, needed to decode audio files.
ModuleNotFoundError: No module named 'soundfile'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py", line 1693, in _prepare_split_single
example = self.info.features.encode_example(record) if self.info.features is not None else record
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1852, in encode_example
return encode_nested_example(self, example)
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1229, in encode_nested_example
{
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1230, in <dictcomp>
k: encode_nested_example(sub_schema, sub_obj, level=level + 1)
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1284, in encode_nested_example
return schema.encode_example(obj) if obj is not None else None
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\audio.py", line 93, in encode_example
raise ImportError("To support encoding audio data, please install 'soundfile'.") from err
ImportError: To support encoding audio data, please install 'soundfile'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Brandon\Documents\00 School Files 00\University\LLM Research\UAC\uac.py", line 5, in <module>
minds = load_dataset("PolyAI/minds14", name="en-US", split="train")
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 2153, in load_dataset
builder_instance.download_and_prepare(
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py", line 1717, in _download_and_prepare
super()._download_and_prepare(
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py", line 1049, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py", line 1555, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py", line 1712, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
TL;DR
Just install soundfile
pip install soundfile
The underlying error is in the stacktrace. It's unfortunately a little difficult to read:
Traceback (most recent call last):
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\audio.py", line 91, in encode_example
import soundfile as sf # soundfile is a dependency of librosa, needed to decode audio files.
ModuleNotFoundError: No module named 'soundfile'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\builder.py", line 1693, in _prepare_split_single
example = self.info.features.encode_example(record) if self.info.features is not None else record
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1852, in encode_example
return encode_nested_example(self, example)
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1229, in encode_nested_example
{
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1230, in <dictcomp>
k: encode_nested_example(sub_schema, sub_obj, level=level + 1)
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\features.py", line 1284, in encode_nested_example
return schema.encode_example(obj) if obj is not None else None
File "C:\Users\Brandon\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\features\audio.py", line 93, in encode_example
raise ImportError("To support encoding audio data, please install 'soundfile'.") from err
ImportError: To support encoding audio data, please install 'soundfile'.
It's complaining about a Python library soundfile
that's missing in your environment.