Remove synonyms of TFIDF results in python

I am currently working on a project where get the top 10 most relevant words of set of document using tfidf in python. However, there are results where are get the same word and its plurial or adverb or so. To go around this problem, I decided to use stemming, but this leads to a problem where words and their antonyms can have the same root or by reducing a word to its root does not enable to go back and find that specific word in the document if a user was to search for it. Is there a nlp that might be better in this context than nlp? Any hint or link will be useful. I working on something that is very similar to youtube.

Solution

First you'd want to make a choice between Stems and Lemmas (neither are Roots, mind you). Google the difference for more on that.

You mention antonyms, but most are determined by prefix (e.g. important vs (un)important). So the Stemmer should leave most antonyms unchanged.

As for synonyms, let's assume you're thinking only about words with the exact same Stem, because if you want to relate synonyms with completely unrelated roots, you'd be thinking about semantics and something like wordnet but that would likely complicate your problem beyond reasonable...

From your question, you already have a Stemmer working in Python...The simplest solution would be using two dictionaries: One dictionary mapping stems/lemmas to the set/list of inflected/derived complete words (and/or their frequency). And a second dictionary mapping those complete words to their various positions in the documents you are indexing.

That way you can stem the user input word, check for it in the top-k tf-idf/stem dictionary, and afterwards map the complete word with the second dictionary to its occurrences in the document set.

(It's hard to elaborate further given your question.)