Search code examples
pythonarrayssetbinary-matrix

What is the fastest way to remove multiple elements from a set, just based on their properties? (E.g. remove all negative numbers.)


In the following example all negative numbers are removed from the set.
But first I have to create the subset of negative numbers.
This does not seem like the most efficient way to achieve this.

my_set = {-100, -5, 0, 123, 3000}
my_set.difference_update({e for e in my_set if e < 0})
assert my_set == {0, 123, 3000}

In the title I refer to a set, but I mean that in a mathematical sense.
My question is not specific to the datatype.

The following example is what I actually want to do.
I have a set of pairs, which could also be seen as a binary matrix,
and I want to remove some rows and columns of that matrix.
Again, the effort to create the set to_be_removed seems wasted to me.
I am looking for a way to directly get rid of all elements with some property.

enter image description here

my_set = {
    (0, 0), (0, 2), (0, 3), (0, 6), (0, 7), (1, 0), 
    (2, 3), (2, 4), (2, 7), (3, 3), (3, 7), (3, 9), 
    (4, 2), (4, 4), (4, 6), (4, 10), (5, 0), (6, 0), 
    (6, 1), (6, 3), (6, 8), (6, 9), (7, 1), (7, 9)
}
to_be_removed = {(i, j) for (i, j) in my_set if i in {0, 1, 7} or j in {0, 1, 6, 7, 10}}
assert to_be_removed == {
    (0, 0), (0, 2), (0, 3), (0, 6), (0, 7), (1, 0), (2, 7), (3, 7), 
    (4, 6), (4, 10), (5, 0), (6, 0), (6, 1), (7, 1), (7, 9)
}
my_set.difference_update(to_be_removed)
assert my_set == {
    (2, 3), (2, 4), (3, 3), (3, 9), (4, 2), (4, 4), 
    (6, 3), (6, 8), (6, 9)
}

Maybe set does not allow this. But I do not care about the datatype.
I suppose that arrays make it easy to set whole rows and columns to zero.
But I would like to avoid wasting space for zeros.
(Sparse matrices, on the other hand, are apparently not made to be changed.)


Edit: The comment by Joe suggests the following:

rows, columns = {2, 3, 4, 5, 6}, {2, 3, 4, 5, 8, 9}
my_set = {(i, j) for (i, j) in my_set if i in rows and j in columns}

That does indeed work, and is probably faster.


Solution

  • in your case with a set comprehension

    my_set = {e for e in my_set if (e[0] not in {0, 1, 7}) and (e[1] not in {0, 1, 6, 7, 10}) }