How do I write the contents of nested dictionaries to a file in a certain format?

I have a dictionary of dictionaries, and I'm trying to output the information within them in a certain way so that it will be usable for downstream analysis. Note: All the keys in dict are in also in list.

for item in list:
    for key, value in dict[item].items():
        print item, key, value

This is the closest I've gotten to what I want, but it's still a long way off. Ideally what I want is:

     item1  item2  item3  item4
key1 value  value  value  value
key2 value  value  value  value
key2 value  value  value  value

Is this even possible?

Solution

First, if I understand your structure, the list is just a way of ordering the keys for the outer dictionary, and a lot of your complexity is trying to use these two together to simulate an ordered dictionary. If so, there's a much easier way to do that: use collections.OrderedDict. I'll come back to that at the end.

First, you need to get all of the keys of your sub-dictionaries, because those are the rows of your output.

From comments, it sounds like all of the sub-dictionaries in dct have the same keys, so you can just pull the keys out of any arbitrary one of them:

keys = dct.values()[0].keys()

If each sub-dictionary can have a different subset of keys, you'll need to instead do a first pass over dct to get all the keys:

keys = reduce(set.union, map(set, dct.values()))

Some people find reduce hard to understand, even when you're really just using it as "sum with a different operator". For them, here's how to do the same thing explicitly:

keys = set()
for subdct in dct.values():
    keys |= set(subdct)

Now, for each key's row, we need to get a column for each sub-dictionary (that is, each value in the outer dictionary), in the order specified by using the elements of the list as keys into the outer dictionary.

So, for each column item, we want to get the outer-dictionary value corresponding to the key in item, and then in the resulting sub-dictionary, get the value corresponding to the row's key. That's hard to say in English, but in Python, it's just:

dct[item][key]

If you don't actually have all the same keys in all of the sub-dictionaries, it's only slightly more complicated:

dct[item].get(key, '')

So, if you didn't want any headers, it would look like this:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    for key in keys:
        w.writerow(dct[item].get(key, '') for item in lst)

To add a header column, just prepend the header (in this case, key) to each of those rows:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    for key in keys:
        w.writerow([key], [dct[item].get(key, '') for item in lst])

Notice that I turned the genexp into a list comprehension so I could use list concatenation to prepend the key. It's conceptually cleaner to leave it as an iterator, and prepend with itertools.chain, but in trivial cases like this with tiny iterables, I think that's just making the code harder to read:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    for key in keys:
        w.writerow(chain([key], (dct[item].get(key, '') for item in lst)))

You also want a header row. That's even easier; it's just the items in the list, with a blank column prepended for the header column:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    w.writerow([''] + lst)
    for key in keys:
        w.writerow([key] + [dct[item].get(key, '') for item in lst])

However, there are two ways to make things even simpler.

First, you can use an OrderedDict, so you don't need the separate key list. If you're stuck with the separate list and dict, you can still build an OrderedDict on the fly to make your code easier to read. For example:

od = collections.OrderedDict((item, dct[item]) for item in lst)

And now:

with open('output.csv', 'wb') as f:
    w = csv.writer(f, delimiter='\t')
    w.writerow([''] + od.keys())
    for key in keys:
        w.writerow([key] + [subdct.get(key, '') for subdct in od.values()])

Second, you could just build the transposed structure:

transposed = {key_b: {key_a: dct[key_a].get(key_b, '') for key_a in dct} 
              for key_b in keys}

And then iterate over it in the obvious order (or use a DictWriter to handle the ordering of the columns for you, and use its writerows method to deal with the rows, so the whole thing becomes a one-liner).