Search code examples
pythonscikit-learnmnist

ValueError: Dataset with data_id 554 not found


I am doing a classification algorithm on MNIST dataset. While I'm loading the dataset using sklearn.datasets

from sklearn.datasets import fetch_openml
mnist=fetch_openml('mnist_784', version=1)
mnist.keys()

I am getting a big error after executing this code.

<ipython-input-2-00e245087535> in <module>
----> 1 mnist = fetch_openml('mnist_784', version=1)

/opt/conda/lib/python3.7/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

/opt/conda/lib/python3.7/site-packages/sklearn/datasets/_openml.py in fetch_openml(name, version, data_id, data_home, target_column, cache, return_X_y, as_frame)
    807         # The shape must include the ignored features to keep the right indexes
    808         # during the arff data conversion.
--> 809         data_qualities = _get_data_qualities(data_id, data_home)
    810         shape = _get_num_samples(data_qualities), len(features_list)
    811     else:

/opt/conda/lib/python3.7/site-packages/sklearn/datasets/_openml.py in _get_data_qualities(data_id, data_home)
    420     error_message = "Dataset with data_id {} not found.".format(data_id)
    421     json_data = _get_json_content_from_openml_api(url, error_message, True,
--> 422                                                   data_home)
    423     try:
    424         return json_data['data_qualities']['quality']

/opt/conda/lib/python3.7/site-packages/sklearn/datasets/_openml.py in _get_json_content_from_openml_api(url, error_message, raise_if_error, data_home)
    168     # 412 error, not in except for nicer traceback
    169     if raise_if_error:
--> 170         raise ValueError(error_message)
    171     return None
    172 

ValueError: Dataset with data_id 554 not found.

How can I get the data?


Solution

  • Code snippet 1

    X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
    

    Code snippet 2

    data = fetch_openml('mnist_784')
    

    Both of these codes should work. But by using code snippet 1, you can mention the version and allocate label and features by default.

    The code you had mentioned above does not work properly because of a cache error associated with fetch_openml.