Search code examples
bigdatadata-scienceknn

A large data set for KNN


I want to apply a modified KNN that it is implemented for large data set. I am tryign to find a large data set (more than 20000 rows) that works perfect for KNN in order to can compare classic KNN and my own version. Any example?


Solution

  • There must be many if searched properly over the internet. The MNIST handwritten digit dataset can be a good place to start, and it has 70000 labelled examples. A carefully tuned KNN works pretty well on this data.

    It can be downloaded from sklearn library.

    >>> from sklearn.datasets import fetch_mldata
    >>> mnist = fetch_mldata('MNIST original', data_home=custom_data_home)
    

    For more details, please refer https://scikit-learn.org/0.19/datasets/mldata.html.