I've an application that shows ~100 tweets of a trending topic. The thing is that a lot of them are really similar(i.e. same tweet with different url), that's why I'd like to ignore really similar tweets.
I'm trying to find an efficient way to do this with python. I'm thinking about using: http://code.google.com/p/pylevenshtein/ to solve this, but I'll have to compare a lot of tweets with each other and maybe there's a simpler way.
Try difflib.get_close_matches to compare each tweet with the rest.