Search code examples
pythonlistdictionarydefaultdict

How to extract top item from a defaultdict(list)?


I am new to working with defaultdicts. I have a matching script that's places a unique identifier as a "key" and then it puts a list of potential matches for the identifier into a dictionary using a defaultdict(list) . The matches are company names, addresses, and matching scores (based on matching algorithms). Sometimes it is a 1-1 match, meaning there is 1 key associated with a match, but sometimes the algorithms catches close matches so there are sometimes multiple matches. For those Id like to select this highest scored match.

Goal: Extract data from defaultdict(list) for each unique identifier. If unique identifier has more than 1 value, then exact the data with the highest Lev Score, Fuzzy Score and Jaro score.

Here's a preview of the data:

#imports
from collections import defaultdict
test_dic_stack = defaultdict(list)

#testing data (unique1 has a 1-1 match &  unique2 has a 1-5 match)
test_dic_stack['unique1'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '123 Road', 'Address_match': '123 Road',  'Lev_score': 98.0, 'Fuzzy_score': 100, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company1', 'Matching Account': 'company1', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street',  'Lev_score': 91.0, 'Fuzzy_score': 89, 'Jaro_Score': 99.0})
test_dic_stack['unique2'].append({'Account Name': 'company2', 'Matching Account': 'company2', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome st',  'Lev_score': 71.0, 'Fuzzy_score': 82, 'Jaro_Score': 84.0})
test_dic_stack['unique2'].append({'Account Name': 'company3', 'Matching Account': 'company3', 'Account_Address': '1 awesome street', 'Address_match': '1 awesome street suite 1',  'Lev_score': 88.0, 'Fuzzy_score': 89, 'Jaro_Score': 90.0})
test_dic_stack['unique2'].append({'Account Name': 'company4', 'Matching Account': 'company4', 'Account_Address': '1 awesome street', 'Address_match': '1 awe street',  'Lev_score': 81.0, 'Fuzzy_score': 90, 'Jaro_Score': 86.0})
test_dic_stack['unique2'].append({'Account Name': 'company5', 'Matching Account': 'company5', 'Account_Address': '1 awesome street', 'Address_match': '1 awe st',  'Lev_score': 70.0, 'Fuzzy_score': 86, 'Jaro_Score': 89.0})

#defaultdict preview
defaultdict(list,
            {'unique1': [{'Account Name': 'company1',
               'Matching Account': 'company1',
               'Account_Address': '123 Road',
               'Address_match': '123 Road',
               'Lev_score': 98.0,
               'Fuzzy_score': 100,
               'Jaro_Score': 99.0}],
             'unique2': [{'Account Name': 'company1',
               'Matching Account': 'company1',
               'Account_Address': '1 awesome street',
               'Address_match': '1 awesome street',
               'Lev_score': 91.0,
               'Fuzzy_score': 89,
               'Jaro_Score': 99.0},
              {'Account Name': 'company2',
               'Matching Account': 'company2',
               'Account_Address': '1 awesome street',
               'Address_match': '1 awesome st',
               'Lev_score': 71.0,
               'Fuzzy_score': 82,
               'Jaro_Score': 84.0},
              {'Account Name': 'company3',
               'Matching Account': 'company3',
               'Account_Address': '1 awesome street',
               'Address_match': '1 awesome street suite 1',
               'Lev_score': 88.0,
               'Fuzzy_score': 89,
               'Jaro_Score': 90.0},
              {'Account Name': 'company4',
               'Matching Account': 'company4',
               'Account_Address': '1 awesome street',
               'Address_match': '1 awe street',
               'Lev_score': 81.0,
               'Fuzzy_score': 90,
               'Jaro_Score': 86.0},
              {'Account Name': 'company5',
               'Matching Account': 'company5',
               'Account_Address': '1 awesome street',
               'Address_match': '1 awe st',
               'Lev_score': 70.0,
               'Fuzzy_score': 86,
               'Jaro_Score': 89.0}]})

Here's my requested result:
Extract unique1 data and extract unique2 "best matched" data. Note sometimes the best match isnt always first

results = [{'unique1': {'Account Name': 'company1',
               'Matching Account': 'company1',
               'Account_Address': '123 Road',
               'Address_match': '123 Road',
               'Lev_score': 98.0,
               'Fuzzy_score': 100,
               'Jaro_Score': 99.0},

          'unique2': {'Account Name': 'company1',
               'Matching Account': 'company1',
               'Account_Address': '1 awesome street',
               'Address_match': '1 awesome street',
               'Lev_score': 91.0,
               'Fuzzy_score': 89,
               'Jaro_Score': 99.0}]



Solution

  • You could use a dictionary comprehension with max using the sum of the three scores as key.

    Assuming d the input dictionary.

    out = {k:max(v, key=lambda x: sum((x['Fuzzy_score'], x['Lev_score'], x['Jaro_Score'])))
           for k,v in d.items()}
    

    Output:

    {'unique1': {'Account Name': 'company1',
      'Matching Account': 'company1',
      'Account_Address': '123 Road',
      'Address_match': '123 Road',
      'Lev_score': 98.0,
      'Fuzzy_score': 100,
      'Jaro_Score': 99.0},
     'unique2': {'Account Name': 'company1',
      'Matching Account': 'company1',
      'Account_Address': '1 awesome street',
      'Address_match': '1 awesome street',
      'Lev_score': 91.0,
      'Fuzzy_score': 89,
      'Jaro_Score': 99.0}}