Search code examples
pythonloopslist-comprehensionkey-valuekey-value-coding

Python compare list to dictionary with multiple values and return matches


If I this list:

list1 = ['a long way home she said', 'amazing grace for me', 'shes super cool']

I want to compare each item in list 1 to the values in dict1 below:

dict1 = {
'Formula1': ('a long way home', 'nothing to see here'),
'Formula 2': ('nasty girl', 'nope'),
'Formula 3': ('disguisting', 'nope'),
'Formula 4': ('amazing grace', 'hello world')
}

How can I get the output to return the key from dict1 with the entire matching phrase in list1?

The desired output would be:

['Formula1': ('a long way home she said'), 'Formula 4': ('amazing grace for me')] or

{'Formula1': ('a long way home she said'), 'Formula 4': ('amazing grace for me')}

I was trying to do this like so:

import itertools

[x if not x in itertools.chain(dict1.values()) else [key for key in dict1 if value in dict[key]][0] for x in list1]

But I think my output is actually just going to return everything in the list after iterating through dictionary values. I have millions of records to go through so list comprehension is preferable to a loop.

[name for name, title in dict1.items() if all(x in title for x in list1)]

This just returned an empty list


Solution

  • For each key in the dictionary, create a new tuple. For every element in list1, if there exists a string in the original tuple value that is a substring of that element, then retain it in the result. Store the result of this computation (in the below code snippet, we store it in the phrases variable).

    Then, we filter out all of the key-value pairs that have empty tuples for values using a dictionary comprehension. You could condense this into one line, but I think it becomes pretty unreadable at that point:

    phrases = {
        key: tuple(
                filter(lambda x: any(substr in x for substr in value), list1)
            )
            for key, value in dict1.items()
    }
    result = {key: value for key, value in phrases.items() if value != ()}
    
    print(result)
    

    This outputs:

    {
        'Formula1': ('a long way home she said',),
        'Formula 4': ('amazing grace for me',)
    }