Following up on my previous question, I have implemented a clustering algorithm for a huge number of strings using Python & Levenshtein distance..But it is taking a very long time to complete clustering. Any suggestions please?
<> iterate thro the list in a for loop for each item in list run through the list again, to find similarity percentage if similarity > threshold, move to cluster end for loop
First, use a profiler to see where most of the time is spent. I suspect it's in the actual Levenshtein calculation, but it's good to be sure. Iff it is: