Search code examples
pythonlist-comprehension

python list comprehension: list of dicts to dict of lists with key intersection


I have a list with a variable number of dictionaries, for ex:

var = [ {'a': 1, 'b': 2}, {'b': 20, 'a': 10, 'c': 30}, {'c': 300, 'a': 100} ]

I need to extract the keys that are common to all dicts, make a list of their associated values, create a new dict out of it, and store it in the same variable:

The expected result would be:

var = { 'a': [1, 10, 100] }

I can find the intersection of the keys with:

[k for k in var[0] if all(k in d for d in var[1:])]

But how can you do the rest of the transformation?


Solution

  • Once you know the keys you care about, just iterate and pull them as you go:

    from collections import defaultdict
    
    new_var = defaultdict(list)
    for d in var:
        for k in common_keys:
            new_var[k].append(d[k])
    
    new_var = dict(new_var)  # Optionally convert back to plain dict to avoid autovivification
    

    A one-liner is also possible (since you've guaranteed the keys exist in all dicts), it's just a little ugly given how much meaning it shoves into a single line:

    new_var = {k: [d[k] for d in var] for k in common_keys}
    

    In this case, the one-liner is fine, it's just a little less flexible if you need to modify it; the explicit loop is easier to tweak, but more verbose.


    Side-note: There is a simpler/faster way to compute the common keys:

    common_keys = set(var[0]).intersection(*var[1:])
    

    This converts the keys of the first dict to a set, then allows set's intersection method to produce the intersection in a single call at the C layer (it's not meaningfully different in how it operates relative to your code, but it avoids a ton of interpreter overhead).

    You could even make it silently handle an empty var by changing it to:

    common_keys = set(*var[:1]).intersection(*var[1:])
    

    which will produce an empty set for common_keys if var contains no dicts at all (which you choose depends on scenario; if having empty var is an error, loudly dying is better than silently doing invalid work).

    Combining the two parts, would let you achieve the truly horrific solution as a complete all-in-one-liner:

    {k: [d[k] for d in var] for k in set(var[0]).intersection(*var[1:])}
    
    # Or to silently accept empty var:
    
    {k: [d[k] for d in var] for k in set(*var[:1]).intersection(*var[1:])}
    

    but that's truly horrific code, so I'd suggest splitting it up a bit.