Search code examples
pythondifflib

Using difflib.get_close_matches to replace word in string - Python


If difflib.get_close_matches can return a single close match. Where I supply the sample string and close match. How can I utilize the 'close match' to replace the string token found?

# difflibQuestion.py

import difflib

word = ['Summerdalerise', 'Winterstreamrise']
line = 'I went up to Winterstreamrose.'

result = difflib.get_close_matches(line,word,n=1)
print(result)

Output:

['Winterstreamrise']

I want to produce the line:

I went up to Winterstreamrise.

For many lines and words.

I have checked the docs

  • can't find any ref to string index of found match difflib.getget_close_matches
  • the other module classes & functions return lists

I Googled "python replace word in line using difflib" etc. I can't find any reference to anyone else asking/writing about it. It would seem a common scenario to me.

This example is of course a simplified version of my 'real world' scenario. Which may be of help. Since I am dealing more with table data (rather than line)

Surname, First names, Street Address, Town, Job Description

And my 'words' are a large list of street base names eg MAIN, EVERY, EASY, LOVERS (without the Road, Street, Lane) So my difflib.get_close_matches could be used to substitute the string of column x 'line' with the closest match 'word'.

However I would appreciate anyone suggesting an approach to either of these examples.


Solution

  • You could try something like this:

    import difflib
    
    possibilities = ['Summerdalerise', 'Winterstreamrise']
    line = 'I went up to Winterstreamrose.'
    
    newWords = []
    for word in line.split():
        result = difflib.get_close_matches(word, possibilities, n=1)
        newWords.append(result[0] if result else word)
    result = ' '.join(newWords)
    print(result)
    

    Output:

    I went up to Winterstreamrise
    

    Explanation:

    • The docs show a first argument named word, and there is no suggestion that get_close_matches() has any awareness of sub-words within this argument; rather, it reports on the closeness of a match between this word atomically and the list of possibilities supplied as the second argument.
    • We can add the awareness of words within line by splitting it into a list of such words which we iterate over, calling get_close_matches() for each word separately and modifying the word in our result only if there is a match.