I use TF-IDF and k-means clustering for text clustering in my MacBook pro. My data has 1400 observations and I want to have 140 clusters. The TF-IDF matrix has 101611692 elements (780.9 Mb). It tooks me 2 days already but the k-means clustering step hasn't finished yet. Is this too computationally expensive for a laptop and is there any faster method? Thank you.
K-Means is probably the simplest of all clustering algos. It's complexity and processing time goes up linearly as the number of data points increases and the number of dimensions increases. So it becomes virtually infeasible to run this methodology in high dimensional spaces with many data points. Remove the stop words and try it on a much smaller sample, like 10% of what you are doing now. Make sure it runs and does what you want, or you will burn through 2 days, and you will end up where you are now, wondering what happened, as nothing is getting done.