I'm making an image classifier that will tell if an image is a car or not, in Python.
here are my steps:
I want to find those k-mean centroids only once and then save them in file for reuse.
My problem is following:
I have 50 precalculated centroids. I have new image with SIFT descriptors. I want to find nearest centroids for each descriptor.
for example: centroid 1 is nearest to 5 descriptors, centroid 2 is nearest to 12 descriptors and so on. Then I will feed those data to SVM.
It is like kmeans.predict(), but i don't want to calculate k-means every time I add new image.
So is there any function in python where I give 50 points (centroids) in hyperspace, N points in same hyperspace and it will return me distribution of those N points according nearest centroids?
Thanks
Have a look at the article about model persistence in the scikit-learn documentation: http://scikit-learn.org/stable/modules/model_persistence.html
Save your model using pickle:
import pickle
with open('kmeans.dat', 'w') as f:
pickle.dump(kmeans, f)
Later you can load it again by using:
with open('kmeans.dat', 'r') as f:
kmeans = pickle.load(f)
Note that you can only load models which have been stored by the same python version.