Search code examples
pythonlistcountintersect

Intersect two lists and count how many times each element overlaps


I am intersecting two lists with the following code:

def interlist(lst1,lst2): 
    lst3 = list(filter(lambda x: x in lst1, lst2))

    return lst3

The thing, is that I want to count every intersection between lst1 and lst2. The result should be a dictionary mapping elements to the number of times they overlap in both lists.


Solution

  • Here's a simple solution using collections.Counter and set intersection. The idea is to first count occurrences of each element in each list separately; then, the number of overlaps is the min of the two counts, for each element. This matches each occurrence in one list with a single occurrence in the other, so the min gives the number of matches that can be made. We only need to count elements which occur in both lists, so we take the intersection of the two key-sets.

    If you want to count all matching pairs instead (i.e. each occurrence in lst1 gets matched with every occurrence in lst2), replace min(c1[k], c2[k]) with c1[k] * c2[k]. This counts the number of ways of choosing a pair with one occurrence from lst1 and one from lst2.

    from collections import Counter
    
    def count_intersections(lst1, lst2):
        c1 = Counter(lst1)
        c2 = Counter(lst2)
        return { k: min(c1[k], c2[k]) for k in c1.keys() & c2.keys() }
    

    Example:

    >>> lst1 = ['a', 'a', 'a', 'b', 'b', 'c', 'e']
    >>> lst2 = ['a', 'b', 'b', 'b', 'd', 'e']
    >>> count_intersections(lst1, lst2)
    {'b': 2, 'a': 1, 'e': 1}
    

    This solution runs in O(m + n) time and uses at most O(m + n) auxiliary space, where m and n are the sizes of the two lists.