Search code examples
pythonscikit-learnk-meanstf-idfconvergence

How to figure out when k means converges for tf idf?


I am fairly new with working with text data.

I have a data frame of about 300,000 unique product names and I am trying to use k means to cluster similar names together. I used sklearn's tfidfvectorizer to vectorize the names and convert to a tf-idf matrix.

After I transformed it to a sparse matrix I fit k means with 5-10 clusters but I do not know if I am converging.

How can I figure this out?


Solution

  • According to the source the attribute n_iter_ should hold the number k-means iterations. If n_iter_ < max_iter, then the algorithm converged within the given tolerance.

    If what you are trying to accomplish is to determine the optimal number of clusters, you can use the elbow method with the inertia_ attribute.