python list combinations counter data-analysis

Find frequency of relationships of tags in lists (pairwise correlation?)

I have some lists of tags for images. I want to find out which tags seem to be related:

l1 = ["cat", "toe", "man"]
l2 = ["cat", "toe", "ice"]
l3 = ["cat", "hat", "bed"]

In this (simple) example obviously, "cat" and "toe" seem related, because they appear two times (l1, l2).

How can this be computed? With a result like: cat & toe: 2. I have a clue that I am asking for "pairwise correlation" but resources to that kind of analysis are too complicated for me.

Solution

You can use collections.defaultdict with frozenset and itertools.combinations to form a dictionary of pairwise counts.

Variations are possible. For example, you can use collections.Counter with sorted tuple instead, but fundamentally the same idea.

from collections import defaultdict
from itertools import combinations

dd = defaultdict(int)

L1 = ["cat", "toe", "man"]
L2 = ["cat", "toe", "ice"]
L3 = ["cat", "hat", "bed"]

for L in [L1, L2, L3]:
    for pair in map(frozenset, (combinations(L, 2))):
        dd[pair] += 1

Result:

defaultdict(int,
            {frozenset({'cat', 'toe'}): 2,
             frozenset({'cat', 'man'}): 1,
             frozenset({'man', 'toe'}): 1,
             frozenset({'cat', 'ice'}): 1,
             frozenset({'ice', 'toe'}): 1,
             frozenset({'cat', 'hat'}): 1,
             frozenset({'bed', 'cat'}): 1,
             frozenset({'bed', 'hat'}): 1})