Attempting to merge a list of dictionaries by a url field, which if has an identical dictionary item in the list, will merge the identical ones by this field while adding the sum for another field at the same time.
I've tried using 'setdefault' but it doesn't always work as expected. I'm still getting duplicate results after running the loop.
Here is the list of dicts I'm trying to condense with the sum of the second field added to get its sum where identical urls exist:
[
['https://www.website.com/directory/link-1',
21,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-1',
185,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-2',
296,
'Long Text Field 2',
'String 2',
{'url': 'https://www.website.com/images/image-2.jpg'},
303],
['https://www.website.com/directory/link-3',
354,
'Long Text Field 3',
'String 3',
{'url': 'https://www.website.com/images/image-3.jpg'},
388],
['https://www.website.com/directory/link-4',
606,
'Long Text Field 4',
'String 4',
{'url': 'https://www.website.com/images/image-4.jpg'},
624]
]
This is the result I'm trying to get:
[
['https://www.website.com/directory/link-1',
206,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-2',
296,
'Long Text Field 2',
'String 2',
{'url': 'https://www.website.com/images/image-2.jpg'},
303],
['https://www.website.com/directory/link-3',
354,
'Long Text Field 3',
'String 3',
{'url': 'https://www.website.com/images/image-3.jpg'},
388],
['https://www.website.com/directory/link-4',
606,
'Long Text Field 4',
'String 4',
{'url': 'https://www.website.com/images/image-4.jpg'},
624]
]
I'm trying
for url, long_text, number_to_count, another_field, ..., ... in list:
d = {}
d.setdefault(url, {}).setdefault("long text", []).append(long_text)
d[url].setdefault("number_to_count",[]).append(number_to_count)
d[url].setdefault("another_field",[]).append(another_field)
Here is something you can try. It basically groups the sublists from lst
by the first URL into a defaultdict of lists, then builds a new result only with the second item number summed up.
from collections import defaultdict
from pprint import pprint
lst = ...
d = defaultdict(list)
for item in lst:
d[item[0]].append(item)
result = [[v[0][0]] + [sum(x[1] for x in v)] + v[0][2:] for v in d.values()]
pprint(result)
Output:
[['https://www.website.com/directory/link-1',
206,
'Long Text Field 1',
'String 1',
{'url': 'https://www.website.com/images/image-1.jpg'},
255],
['https://www.website.com/directory/link-2',
296,
'Long Text Field 2',
{'url': 'https://www.website.com/images/image-2.jpg'},
303],
['https://www.website.com/directory/link-3',
354,
'Long Text Field 3',
{'url': 'https://www.website.com/images/image-3.jpg'},
388],
['https://www.website.com/directory/link-4',
606,
'Long Text Field 4',
{'url': 'https://www.website.com/images/image-4.jpg'},
624]]