This question is an extension of a previous question: rebuild python array based on common elements - but different enough to warrant a new question:
I've been struggling with this for a bit now. My data is an array of dictionaries from an sql query. Each element in the array represents a shipment, and there are common values based on the keys.
data = [
{"CustName":"customer1", "PartNum":"part1", "delKey":"0001", "qty":"10", "memo":"blah1"},
{"CustName":"customer1", "PartNum":"part1", "delKey":"0002", "qty":"10", "memo":"blah2"},
{"CustName":"customer1", "PartNum":"part1", "delKey":"0003", "qty":"10", "memo":"blah3"},
{"CustName":"customer2", "PartNum":"part3", "delKey":"0004", "qty":"20", "memo":"blah4"},
{"CustName":"customer2", "PartNum":"part3", "delKey":"0005", "qty":"20", "memo":"blah5"},
{"CustName":"customer3", "PartNum":"partXYZ", "delKey":"0006", "qty":"50", "memo":"blah6"},
{"CustName":"customer3", "PartNum":"partABC", "delKey":"0007", "qty":"100", "memo":"blah7"}]
The output I want is grouped according to specific keys
dataOut = [
{"CustName":"customer1", "Parts":[
{"PartNum":"part1", "deliveries":[
{"delKey":"0001", "qty":"10", "memo":"blah1"},
{"delKey":"0002", "qty":"10", "memo":"blah2"},
{"delKey":"0003", "qty":"10", "memo":"blah3"}]}]},
{"CustName":"customer2", "Parts":[
{"PartNum":"part3", "deliveries":[
{"delKey":"0004", "qty":"20", "memo":"blah4"},
{"delKey":"0005", "qty":"20", "memo":"blah5"}]}]},
{"CustName":"customer3", "Parts":[
{"PartNum":"partXYZ", "deliveries":[
{"delKey":"0006", "qty":"50", "memo":"blah6"}]},
{"PartNum":"partABC", "deliveries":[
{"delKey":"0007", "qty":"100", "memo":"blah7"}]}]}]
I can get the grouping with a single level using defaultdict and list comprehension as provided by the previous question and modified slightly
d = defaultdict(list)
for item in data:
d[item['CustName']].append(item)
print([{'CustName': key, 'parts': value} for key, value in d.items()])
But I can't seem to get the second level in the output array - the grouping b the PartNum
key. Through some research, I think what I need to do is use defaultdict
as the type of the outer `defaultdict' like so:
d = defaultdict(defaultdict(list))
which throws errors because defaultdict returns a function, so I need to use lambda
(yes?)
d = defaultdict(lambda:defaultdict(list))
for item in data:
d[item['CustName']].append(item) <----this?
My question is how to "access" the second level array in the loop and tell the "inner" defaultdict what to group on (PartNum)? The data comes to me from the database programmer and the project keeps evolving to add more and more data (keys), so I'd like this solution to be as general as possible in case more data gets thrown my way. I was hoping to be able to "chain" the defaultdicts depending on how many levels I need to go. I'm learning as I'm going, so I'm struggling trying to understand the lambda
and the basics of the defaultdict
type and where to go from here.
Using groupby
as suggested by @Pynchia and using sorted
for unordered data as suggested by @hege_hegedus:
from itertools import groupby
dataOut = []
dataSorted = sorted(data, key=lambda x: (x["CustName"], x["PartNum"]))
for cust_name, cust_group in groupby(dataSorted, lambda x: x["CustName"]):
dataOut.append({
"CustName": cust_name,
"Parts": [],
})
for part_num, part_group in groupby(cust_group, lambda x: x["PartNum"]):
dataOut[-1]["Parts"].append({
"PartNum": part_num,
"deliveries": [{
"delKey": delivery["delKey"],
"memo": delivery["memo"],
"qty": delivery["qty"],
} for delivery in part_group]
})
If you look at the second for
loop, this will hopefully answer your question about accessing the second level array in the loop.