Search code examples
pythonnlpfull-text-searchstring-matchingfuzzy-search

Is there a way to do fuzzy string matching for words on string?


I want to do fuzzy matching on string with words.

The target string could be like. "Hello, I am going to watch a film today."
where the words I want to search are.
"flim toda".

This hopefully should return "film today" as a search result.

I have used this method but it seems to be working only with one word.

import difflib

def matches(large_string, query_string, threshold):
    words = large_string.split()
    matched_words = []
    for word in words:
        s = difflib.SequenceMatcher(None, word, query_string)
        match = ''.join(word[i:i+n] for i, j, n in s.get_matching_blocks() if n)
        if len(match) / float(len(query_string)) >= threshold:
            matched_words.append(match)
    return matched_words
large_string = "Hello, I am going to watch a film today"
query_string = "film"
print(list(matches(large_string, query_string, 0.8)))

This only works with one word and it returns when there is little noise.

Is there any way to do such fuzzy matching with words?


Solution

  • The feature you are thinking of is called "query suggestion" and does rely on spell checking, but it relies on markov chains built out of search engine query log.

    That being said, you use an approach similar to the one described in this answer: https://stackoverflow.com/a/58166648/140837