I want to apply a modified KNN that it is implemented for large data set. I am tryign to find a large data set (more than 20000 rows) that works perfect for KNN in order to can compare classic KNN and my own version. Any example?
There must be many if searched properly over the internet. The MNIST handwritten digit dataset can be a good place to start, and it has 70000 labelled examples. A carefully tuned KNN works pretty well on this data.
It can be downloaded from sklearn library.
>>> from sklearn.datasets import fetch_mldata
>>> mnist = fetch_mldata('MNIST original', data_home=custom_data_home)
For more details, please refer https://scikit-learn.org/0.19/datasets/mldata.html.