Compute sentence similarity between predicted sentence and a list of sentences(Using TDIDF)

i am trying to find the a method that uses TDIDF to see how 'new' a predicted sentence is compared to the list it was generated from.

So for example:

New sent. = "Hello world"

Then i have a list of sentences and i want to find for example the top 5 sentence that are most comparable to the new sentence.

I know i need to vectorize the sentences, but how do i then get a score for each sentence in the list and return the top 5 most comparable.

Solution

One of the intro 'Core Concepts' sections of the documentation for Gensim (a popular Python library for modeling text) shows TFIDF-vectorization, then creating a helper index (which lets you check one vector against a bunch, listing the top results).

See: https://radimrehurek.com/gensim/auto_examples/core/run_core_concepts.html#core-concepts

matplotlib 3D scatter plot alpha varies when viewing different angles
How to write very long string that conforms with PEP8 and prevent E501
Getting Home Directory with pathlib
how to avoid bot detection on websites using selenium python
Python mock to create a fake object return a dictionary when any of its attributes are used
Polars vs. Pandas: size and speed difference
How to mock.patch a class imported in another module
Python - error cannot determine truth value of Relational (Newton-Raphson)
ProcessPoolExecutor logging fails to log inside function on Windows but not on Unix / Mac
SQLAlchemy ORM Insert or Update when importing from JSON
django managers vs proxy models
Pytroch clamp for complex values
For every identifier select only rows with largest order column
truth value for Expr is ambiguous in with_columns ternary expansion on dates
Remove equal characters from two python strings
Python pyad module can't set UPN
Macro VS Micro VS Weighted VS Samples F1 Score
Printing a Tree data structure in Python
How to fix/reset decreasing timestamps while preserving gaps in time-series data for CNN training?
Test that module is NOT imported
Pyserial module isn't installed on PATH
Print a multiplication table in Python
Python: ModuleNotFoundError: No module named 'xyz'
Receiving Import Error: No Module named ***, but has __init__.py
PyQt5 QProgressBar border radius issue
URL-encoding and -decoding a string in Python
Fastest way to find the smallest possible sum of the absolute differences of pairs within a single array?
Flask: Update Code Reference for: current_app._get_current_object()
Export Charts from Excel as images using Python
Align yaxis label spanning two axes with yaxis labels of one axes in subplots