I have a dictionary of dictionaries, and I'm trying to output the information within them in a certain way so that it will be usable for downstream analysis. Note: All the keys in dict
are in also in list
.
for item in list:
for key, value in dict[item].items():
print item, key, value
This is the closest I've gotten to what I want, but it's still a long way off. Ideally what I want is:
item1 item2 item3 item4
key1 value value value value
key2 value value value value
key2 value value value value
Is this even possible?
First, if I understand your structure, the list is just a way of ordering the keys for the outer dictionary, and a lot of your complexity is trying to use these two together to simulate an ordered dictionary. If so, there's a much easier way to do that: use collections.OrderedDict
. I'll come back to that at the end.
First, you need to get all of the keys of your sub-dictionaries, because those are the rows of your output.
From comments, it sounds like all of the sub-dictionaries in dct
have the same keys, so you can just pull the keys out of any arbitrary one of them:
keys = dct.values()[0].keys()
If each sub-dictionary can have a different subset of keys, you'll need to instead do a first pass over dct
to get all the keys:
keys = reduce(set.union, map(set, dct.values()))
Some people find reduce
hard to understand, even when you're really just using it as "sum
with a different operator". For them, here's how to do the same thing explicitly:
keys = set()
for subdct in dct.values():
keys |= set(subdct)
Now, for each key's row, we need to get a column for each sub-dictionary (that is, each value in the outer dictionary), in the order specified by using the elements of the list as keys into the outer dictionary.
So, for each column item
, we want to get the outer-dictionary value corresponding to the key in item
, and then in the resulting sub-dictionary, get the value corresponding to the row's key
. That's hard to say in English, but in Python, it's just:
dct[item][key]
If you don't actually have all the same keys in all of the sub-dictionaries, it's only slightly more complicated:
dct[item].get(key, '')
So, if you didn't want any headers, it would look like this:
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter='\t')
for key in keys:
w.writerow(dct[item].get(key, '') for item in lst)
To add a header column, just prepend the header (in this case, key
) to each of those rows:
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter='\t')
for key in keys:
w.writerow([key], [dct[item].get(key, '') for item in lst])
Notice that I turned the genexp into a list comprehension so I could use list concatenation to prepend the key
. It's conceptually cleaner to leave it as an iterator, and prepend with itertools.chain
, but in trivial cases like this with tiny iterables, I think that's just making the code harder to read:
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter='\t')
for key in keys:
w.writerow(chain([key], (dct[item].get(key, '') for item in lst)))
You also want a header row. That's even easier; it's just the items in the list, with a blank column prepended for the header column:
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter='\t')
w.writerow([''] + lst)
for key in keys:
w.writerow([key] + [dct[item].get(key, '') for item in lst])
However, there are two ways to make things even simpler.
First, you can use an OrderedDict
, so you don't need the separate key list. If you're stuck with the separate list
and dict
, you can still build an OrderedDict
on the fly to make your code easier to read. For example:
od = collections.OrderedDict((item, dct[item]) for item in lst)
And now:
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter='\t')
w.writerow([''] + od.keys())
for key in keys:
w.writerow([key] + [subdct.get(key, '') for subdct in od.values()])
Second, you could just build the transposed structure:
transposed = {key_b: {key_a: dct[key_a].get(key_b, '') for key_a in dct}
for key_b in keys}
And then iterate over it in the obvious order (or use a DictWriter
to handle the ordering of the columns for you, and use its writerows
method to deal with the rows, so the whole thing becomes a one-liner).