i am trying to find the a method that uses TDIDF to see how 'new' a predicted sentence is compared to the list it was generated from.
So for example:
New sent. = "Hello world"
Then i have a list of sentences and i want to find for example the top 5 sentence that are most comparable to the new sentence.
I know i need to vectorize the sentences, but how do i then get a score for each sentence in the list and return the top 5 most comparable.
One of the intro 'Core Concepts' sections of the documentation for Gensim (a popular Python library for modeling text) shows TFIDF-vectorization, then creating a helper index (which lets you check one vector against a bunch, listing the top results).
See: https://radimrehurek.com/gensim/auto_examples/core/run_core_concepts.html#core-concepts