Search code examples
pythonlistdictionarydefaultdict

In a list of dicts, flag a dict if combination of key/value pairs is identical in another dict


I have a list of dictionaries with the keys street, number and some_flag.

My goal is to search the dicts for duplicates in the keys street and number. If for two or more dicts these two key/value pairs are identical, I want to assign the value 1 to their some_flag key.

Please see reproducible example below.

Starting list of dictionaries:

a = [
    {'street': 'ocean drive', 'number': '1', 'some_flag': 0},
    {'street': 'ocean drive', 'number': '3', 'some_flag': 0},
    {'street': 'ocean drive', 'number': '4', 'some_flag': 0}, # duplicate street / number keys
    {'street': 'ocean drive', 'number': '4', 'some_flag': 0}, # duplicate street / number keys
    {'street': 'apple tree rd.', 'number': '3', 'some_flag': 0},
]

Expected output:

a_checked = [
    {'street': 'ocean drive', 'number': '1', 'some_flag': 0},
    {'street': 'ocean drive', 'number': '3', 'some_flag': 0},
    {'street': 'ocean drive', 'number': '4', 'some_flag': 1}, # duplicate street / number keys
    {'street': 'ocean drive', 'number': '4', 'some_flag': 1}, # duplicate street / number keys
    {'street': 'apple tree rd.', 'number': '3', 'some_flag': 0},
]

My best effort:

The code I've got so far is derived from Aarons answer (here) and the community wiki's answer (here)

from collections import defaultdict, Counter

items = defaultdict(list) # create defaultdict 

for row in a:
    items[row['street']].append(row['number'])  # make a list of 'number' values for each 'street' key


for key in items.keys():
    if checkIfDuplicates(items[key]):  #if there is more than one 'number' --> function definition see below  
        duplicate_dict = {}
        duplicate_dict['numbers'] =  [item for item, count in Counter(items[key]).items() if count > 1] # storing duplicate numbers in dict
        duplicate_dict['street'] = key # storing street name in same dict

Function to check if given list contains any duplicates (from here):

def checkIfDuplicates(listOfElems): 
    if len(listOfElems) == len(set(listOfElems)):
        return False
    else:
        return True
        

current output:

print(duplicate_dict)
{'numbers': ['4'], 'street': 'ocean drive'}

With my approach, I would now have to match the duplicate_dict with the original list a, which doesn't seem very efficient.

Are there more direct ways to solve this problem?


Solution

  • You could use dict.setdefault to first store a dict of lists (where the keys are "street" and "number") and then iterate over the values of this dictionary to check if multiple dicts have the same "street" and "number" and modify "some_flag" of those that are multiple:

    tmp = {}
    for d in a:
        tmp.setdefault((d['street'], d['number']), []).append(d)
    out = []
    for v in tmp.values():
        if len(v) > 1:
            for d in v:
                d['some_flag'] = 1
        out.extend(v)
    

    Output:

    [{'street': 'ocean drive', 'number': '1', 'some_flag': 0},
     {'street': 'ocean drive', 'number': '3', 'some_flag': 0},
     {'street': 'ocean drive', 'number': '4', 'some_flag': 1},
     {'street': 'ocean drive', 'number': '4', 'some_flag': 1},
     {'street': 'apple tree rd.', 'number': '3', 'some_flag': 0}]