Search code examples
python-3.xdata-visualizationbitwise-operatorsgeopandasbitwise-not

Python: How is a ~ used to exclude data?


In the below code I know that it is returning all records that are outside of the buffer, but I'm confused as to the mechanics of how that is happening.

I see that there is a "~" (aka a bitwise not) being used. From some googling my understanding of ~ is that it returns the inverse of each bit in the input it is passed eg if the bit is a 0 it returns a 1. Is this correct if not could someone please ELI5?

Could someone please explain the actual mechanics of how the below code is returning records that are outside of the "my_union" buffer?

NOTE: hospitals and collisions are just geo dataframes.

coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000) 
my_union = coverage.geometry.unary_union 
outside_range = collisions.loc[~collisions["geometry"].apply(lambda x: my_union.contains(x))]

Solution

  • ~ does indeed perform a bitwise not in python. But here it is used to perform a logical not on each element of a list (or rather pandas Series) of booleans. See this answer for an example.

    Let's assume the collisions GeoDataFrame contains points, but it will work similarly for other types of geometries. Let me further change the code a bit:

    coverage = gpd.GeoDataFrame(geometry=hospitals.geometry).buffer(10000) 
    my_union = coverage.geometry.unary_union
    within_my_union = collisions["geometry"].apply(lambda x: my_union.contains(x))
    outside_range = collisions.loc[~within_my_union]
    

    Then:

    1. my_union is a single (Multi)Polygon.

    2. my_union.contains(x) returns a boolean indicating whether the point x is within the my_union MultiPolygon.

    3. collisions["geometry"] is a pandas Series containing the points.

    4. collisions["geometry"].apply(lambda x: my_union.contains(x)) will run my_union.contains(x) on each of these points. This will result in another pandas Series containing booleans, indicating whether each point is within my_union.

    5. ~ then negates these booleans, so that the Series now indicates whether each point is not within my_union.

    6. collisions.loc[~within_my_union] then selects all the rows of collisions where the entry in ~within_my_union is True, i.e. all the points that don't lie within my_union.