Search code examples
pythonlistfunctional-programmingtuplespython-itertools

Python : filter set/list of tuple based on cardinality property


I'm looking for a way to filter a set (or list) of tuples based on the number of times an item appears in one of the other of the position of the tuple.

My current goal is a bit complex so I divided the problem in three smaller steps.

1. Let's start with the simplest case, only a single value which applies only to the first element of the tuple

For instance:

my_filter([(1,2),(1,3),(2,4),(3,1),(3,4),(3,5),(5,2),(5,4)], 2)

Should return:

[(1,2),(1,3),(5,2),(5,4)]

Because these are the only tuples for which the first item of the tuple is linked appears only twice in the whole list.

The naive way of doing it is : for each first element of tuple in list, count the number of times this element appears as first element in all tuples and, if the count matches the chosen number, add all tuples having this element at first position.

But I feel like this is so unoptimal and I have to iterate over the list for each possible value, I'm surely missing a better way of doing it.

2. Make it reciprocal

Ideally it would like to be able to apply the same treatment based on the second element of the tuple, with another cardinality parameter

For instance:

my_filter([(1,2),(1,3),(2,4),(3,1),(3,4),(3,5),(5,2),(5,4)], 2, 1)

Here we want to keep only tuples in which the first element appears exactly twice but with the second element appearing only once (intersection of the two conditions). This should return:

[(1,3)]

3. Generalizing to multiple values

my_filter([(1,2),(1,3),(2,4),(3,1),(3,4),(3,5),(5,2),(5,4)], 2, [1,3])

In this case, we allow the cardinality filter to take multiple possible values. In this example, we want to keep tuples for which the first element appears exactly twice (in first position) and the second element appears either once or three times (in the second position). This should return:

[(1,3),(5,4)]

Once again, I have no problem writing a naive solution that would simply iterate over each allowed values and join result sets, but I'm looking for something smarter.

I feel like there could be some useful functions in the itertools library but I'm not comfortable enough with it. Any advice ? Thanks.


Solution

  • Here's a linear time (O(n)) solution for parts 2 and 3 (1 can be implemented with some tweaks):

    First we convert the second and third arguments into a set, which is O(n).

    Then we calculate the frequency of each element at position 0 and 1, again O(n).

    Then we iterate over the list and check whether it matches our criteria. Lookups in Counter and sets are both O(1) so this thing is again efficient, O(n) overall.

    from collections import Counter
    
    
    def my_filter(list: list[tuple[int, int]], first: list[int], second: list[int]):
        first_set = set(first)
        second_set = set(second)
        first_counter = Counter(a for (a, _) in list)
        second_counter = Counter(b for (_, b) in list)
        return [
            (a, b)
            for (a, b) in list
            if first_counter[a] in first_set and second_counter[b] in second_set
        ]
    
    
    print(
        my_filter(
            [(1, 2), (1, 3), (2, 4), (3, 1), (3, 4), (3, 5), (5, 2), (5, 4)], [2], [1]
        )
    )
    
    print(
        my_filter(
            [(1, 2), (1, 3), (2, 4), (3, 1), (3, 4), (3, 5), (5, 2), (5, 4)], [2], [1, 3]
        )
    )
    

    Output:

    [(1, 3)]
    [(1, 3), (5, 4)]