I have implemented an algorithm in Elm, where I compare a sentence (user input) to other multiple sentences (data). The algorithm is working in such a manner, where the user input and the data is converted to words, and then I compare them by words. the algorithm will mark any sentence from the data, which has the most words in the user input, as the best match.
Now, at the first run, the first sentence from the data will be counted as the best match and then going to the second sentence and looks for matches. If the matches number is greater than the previous one, then the second sentence will be counted as the best match, otherwise the previous one.
In case, if there are equal matches in two sentences, then currently I am comparing the size of these two sentences and select the one, which has the smaller size, as the best match.
There is no semantic meaning involved, so is this the best way to select the best match, which has the smaller size in this case? or are there some other better options available? I have tried to look for some scientific references, but couldn't find any.
Edit:
To summarize, if you want to compare one sentence to two other sentences, based on word occurrences, If both of the sentences have the same number of words, which also exist in your comparing sentence, then which one can be marked as the most similar? which methods are used to retrieve this similarity?
Some factors you can add in to improve the comparison:
Finally, there will always be some sentences that have the same difference to your input, although they are different. That's OK, as long as they are actually similarly different to your input sentence.