Search code examples
pythonsearchmatchingdifflib

How do i use difflib to return a list by searching for an element in the list?


I have a list of lists that looks something like this:

list123 = [["Title a1","100 Price","Company xx aa"], ["Title b1","200 Price","Company yy bb"], ["Title c1","300 Price","Company zz cc"]]

How do I use difflab.get_close_matches(or something else) to return whole inner list by searching for a specific inner-inner element that matches a search param?

How I thought it would work:

print(difflib.get_close_matches('Company xx a', list123))

expected output / output I'd like:

 ["Title a1","100 Price","Company xx aa"]

Actual output:

 []

I'm aware of using something like:

for item in list123:
    if "Company xx aa" in item:
        print(item)

But I'd like to use the difflib library(or something else) to allow more "human" searches where small spelling mistakes are allowed.

If I misunderstood the purpose of the function, is there another one that can achieve what I'd like?


Solution

  • The problem is that the second parameter of get_closest_matches should be a list of strings, from the documentation:

    possibilities is a list of sequences against which to match word (typically a list of strings).

    To fix your issue, do the following:

    import difflib
    
    
    def key(choices, keyword='Company xx a'):
        matches = difflib.get_close_matches(keyword, choices)
        if matches:
            best_match, *_ = matches
            return difflib.SequenceMatcher(None, keyword, best_match).ratio()
        return 0.0
    
    
    list123 = [["Title a1", "100 Price", "Company xx aa"],
               ["Title b1", "200 Price", "Company yy bb"],
               ["Title c1", "300 Price", "Company zz cc"]]
    
    res = max(list123, key=key)
    
    print(res)
    

    Output

    ['Title a1', '100 Price', 'Company xx aa']
    

    The idea is that the key function will return the similarity of the best match of each list, then you can use it in conjunction with max to find the list with the best match.