Can I reverse or offset the TF-IDF score such that MORE COMMON terms will contribute more to the final score?
I would like to find the most common set of words in the corpus, that isn't unique to any small subset of documents.
I know this is a very old post, but none of the suggestions in the comment section works well.
The "1/TF-IDF" one only gives you words that are rare throughout documents.
Remember that TF-IDF not only deprecates prevalent words but also rare words.
I have recently achieved your goal by using the "tf" and "idf" statics with the following steps:
I found that words with higher Rev_tf_idf are those that are prevalent throughout all documents in my own data.
Hope this would work for those who have the same inquiry.