Search code examples
pythonpython-3.xfuzzy-search

How can I fuzzy search with a keyword and return the matched substring?


I'd like to be able to find and replace in a fuzzy way. So I need to do a fuzzy search of text and return a fuzzy match to a keyword, but i'm struggling to find an implementation for this. For example, I would like to do something like this:

text = 'The sunset is a lovely colour this evening'
keyword = 'Color'
desired_result = colour
text.replace(desired_result, keyword)
print(text)
The sunset is a lovely Color this evening

To complicate matters the phrases that need to be replaced could be more than one word, so splitting won't work.

I've tried FuzzyWuzzy's process function, but this only will return the keyword not the match. For example:

process.extractOne("This sunset is a lovely colour this evening", "Color")
("Color", 90)

I need the match in the text so I can replace.

Python's Regex can do fuzzy matching but performance is a concern and it doesn't seem to work for me with full phrase.

text = 'The sunset is a lovely colour this evening'
term = 'Color'
r = regex.compile('('+text +'){e<=5}')
print(r.match(term ))
None

Solution

  • If you're using fuzzy search you can use find_near_matches to get the indices of matches, and then use a list comprehension from that to get the actual strings used

    from fuzzysearch import find_near_matches
    my_string = 'aaaPATERNaaa'
    matches = find_near_matches('PATTERN', my_string, max_l_dist=1)
    
    print([my_string[m.start:m.end] for m in matches])