Reversed TF-IDF in Python

Can I reverse or offset the TF-IDF score such that MORE COMMON terms will contribute more to the final score?

I would like to find the most common set of words in the corpus, that isn't unique to any small subset of documents.

Solution

I know this is a very old post, but none of the suggestions in the comment section works well.

The "1/TF-IDF" one only gives you words that are rare throughout documents.

Remember that TF-IDF not only deprecates prevalent words but also rare words.

I have recently achieved your goal by using the "tf" and "idf" statics with the following steps:

Within each document, create a new statistic (Rev_tf_idf) by dividing tf with idf. (The original Tf_IDf is tf*idf )
Group by words and sum up Rev_tf_idf within the group.

I found that words with higher Rev_tf_idf are those that are prevalent throughout all documents in my own data.

Hope this would work for those who have the same inquiry.