Text clustering takes too much time

I use TF-IDF and k-means clustering for text clustering in my MacBook pro. My data has 1400 observations and I want to have 140 clusters. The TF-IDF matrix has 101611692 elements (780.9 Mb). It tooks me 2 days already but the k-means clustering step hasn't finished yet. Is this too computationally expensive for a laptop and is there any faster method? Thank you.

Solution

K-Means is probably the simplest of all clustering algos. It's complexity and processing time goes up linearly as the number of data points increases and the number of dimensions increases. So it becomes virtually infeasible to run this methodology in high dimensional spaces with many data points. Remove the stop words and try it on a much smaller sample, like 10% of what you are doing now. Make sure it runs and does what you want, or you will burn through 2 days, and you will end up where you are now, wondering what happened, as nothing is getting done.

Calculate average distance to coastline in R
How to use purrr:map() and rlang to emulate a pipe chain
magick annotate picture with arrows
Import png files and convert to animation(.mp4) in R
Convert a column with text files into separate images in R
How to calculate RSE_Var from SE_var/mean_Var row-wise for many variables, Var, using pivot() in R?
ggplot2 x-axis with many hours for each of many days. Is there a way to span the dates in the x-axis over the hours for that day?
gtsummary - Wilcoxon on ordered factor
R odbc::odbcListDrivers() does not list dirver in /opt/homebrew/etc/odbcinst.ini
Change size and aspect ratio without distortion
How can I get an R environment via Sys.getenv() with GitHub Actions using secrets?
ggplot geom_point color based on both x and y axis values
Perform a random binomial draw for each row in R without rowwise()
Scrape the university name (in QS World University Rankings website) with R
ggplot for linear-log regression model?
How to join (merge) data frames (inner, outer, left, right)
Avoid rescaling while binning using scale_*_steps
How can I mock a function globally using testthat?
R - could not find function "cld"
Create multiple lagged variables with different offsets
Expanding dataframe to include non existing values
Split string to columns based on paragraph ending from ocr'd image
from magick-image to rasterBrick
How to remove repeated elements in a vector, similar to 'set' in Python
Rename multiple variables at once using dplyr
Reading large multi-part table from file and combing its parts into one tibble
Processing multiple images with Magick (in R) with transformations
R: Convert/Read 3D Matrix into a 'magick' object and vice versa
Error using magick R to import PDF
Method in R to crop whitespace on svg file