I have a list of dictionaries with the keys street
, number
and some_flag
.
My goal is to search the dicts for duplicates in the keys street
and number
. If for two or more dicts these two key/value pairs are identical, I want to assign the value 1 to their some_flag
key.
Please see reproducible example below.
a = [
{'street': 'ocean drive', 'number': '1', 'some_flag': 0},
{'street': 'ocean drive', 'number': '3', 'some_flag': 0},
{'street': 'ocean drive', 'number': '4', 'some_flag': 0}, # duplicate street / number keys
{'street': 'ocean drive', 'number': '4', 'some_flag': 0}, # duplicate street / number keys
{'street': 'apple tree rd.', 'number': '3', 'some_flag': 0},
]
a_checked = [
{'street': 'ocean drive', 'number': '1', 'some_flag': 0},
{'street': 'ocean drive', 'number': '3', 'some_flag': 0},
{'street': 'ocean drive', 'number': '4', 'some_flag': 1}, # duplicate street / number keys
{'street': 'ocean drive', 'number': '4', 'some_flag': 1}, # duplicate street / number keys
{'street': 'apple tree rd.', 'number': '3', 'some_flag': 0},
]
The code I've got so far is derived from Aarons answer (here) and the community wiki's answer (here)
from collections import defaultdict, Counter
items = defaultdict(list) # create defaultdict
for row in a:
items[row['street']].append(row['number']) # make a list of 'number' values for each 'street' key
for key in items.keys():
if checkIfDuplicates(items[key]): #if there is more than one 'number' --> function definition see below
duplicate_dict = {}
duplicate_dict['numbers'] = [item for item, count in Counter(items[key]).items() if count > 1] # storing duplicate numbers in dict
duplicate_dict['street'] = key # storing street name in same dict
Function to check if given list contains any duplicates (from here):
def checkIfDuplicates(listOfElems):
if len(listOfElems) == len(set(listOfElems)):
return False
else:
return True
current output:
print(duplicate_dict)
{'numbers': ['4'], 'street': 'ocean drive'}
With my approach, I would now have to match the duplicate_dict
with the original list a
, which doesn't seem very efficient.
Are there more direct ways to solve this problem?
You could use dict.setdefault
to first store a dict of lists (where the keys are "street" and "number") and then iterate over the values of this dictionary to check if multiple dicts have the same "street" and "number" and modify "some_flag" of those that are multiple:
tmp = {}
for d in a:
tmp.setdefault((d['street'], d['number']), []).append(d)
out = []
for v in tmp.values():
if len(v) > 1:
for d in v:
d['some_flag'] = 1
out.extend(v)
Output:
[{'street': 'ocean drive', 'number': '1', 'some_flag': 0},
{'street': 'ocean drive', 'number': '3', 'some_flag': 0},
{'street': 'ocean drive', 'number': '4', 'some_flag': 1},
{'street': 'ocean drive', 'number': '4', 'some_flag': 1},
{'street': 'apple tree rd.', 'number': '3', 'some_flag': 0}]