Search code examples
pythonalgorithmperformancesetlow-latency

How can I quickly compare a list and a set?


Say I have a list

l = [1, 1 , 1, 2, 3, 4, 5, 5]

and two disjoint sets of equal length

a = (1, 3) and b = (2, 5)

and I want to get the elements in l that is in a and b separately like

[1, 1, 1, 3] and [2, 5, 5]

I tried list comprehension like [x for x in l if x in a] but that takes a long time if the length of l, a, and b is 10^5

EDIT: the sets are disjoint sets of equal length.

EDIT: What I need to do is count the elements in l that is common in a (with duplicates) minus that with elements of l in b (with duplicates too). So the above example should output 1. The problem is if the list and sets are as long as 10E5. Using filter and itertools still takes too long.

EDIT: I got it now! Apparently I have to wrap the input sets with set()! I didn't at first (I only got it via input().split()) because the inputs are unique already but didn't know list and sets are very different and sets are faster! Well, TIL for me.


Solution

  • The fundamental problem is that you aren't using appropriate data structures for the job. Using tuples to represent sets might be ok for small sets in this case, but for large sets, you can expect to search an average of half the combined size of the sets for each element in the list that is actually in one of the sets. For each element in the list that is not in either set, we must search all elements of both sets to determine that.

    So any algorithm based on these data structures (i.e., representing sets using tuples) will at best be O(m*n), where m is the size of the list and n is the size of the sets.

    There really isn't any way we can reduce the m component — we have to examine each element of the list to determine which set (if any) it belongs to.

    We can, however, reduce the n component. How? By using a more efficient data structure for our sets.

    Fortunately, this is not hard, as Python includes a built-in set type. So the first step is to construct the two sets:

    a = set((1, 3))
    b = set((2, 5))
    

    Now, we can easily (and efficiently) determine if an element e is in one of the sets:

    e = 1
    e in a # => True
    e in b # => False
    

    Now, we just need to loop over the input list and accumulate the result:

    l = [1, 1, 3, 2, 5, 7, 8, 3, 2, 1]
    result = 0 # accumulator for result
    for e in l:
      if e in a:
        result += 1
      elif e in b:
        result -= 1
    
    print result # prints "2"