Search code examples

Can I use pandas.dataframe.isin() with a numeric tolerance parameter?

I reviewed the following posts beforehand. Is there a way to use DataFrame.isin() with an approximation factor or a tolerance value? Or is there another method that could?

Filter dataframe rows if value in column is in a set list of values

use a list of values to select rows from a pandas dataframe


df = DataFrame({'A' : [5,6,3.3,4], 'B' : [1,2,3.2, 5]})

In : df
   A    B
0  5    1
1  6    2
2  3.3  3.2
3  4    5  

df[df['A'].isin([3, 6], tol=.5)]

In : df
   A    B
1  6    2
2  3.3  3.2


  • You can do a similar thing with numpy's isclose:

    df[np.isclose(df['A'].values[:, None], [3, 6], atol=.5).any(axis=1)]
         A    B
    1  6.0  2.0
    2  3.3  3.2

    np.isclose returns this:

    np.isclose(df['A'].values[:, None], [3, 6], atol=.5)
    array([[False, False],
           [False,  True],
           [ True, False],
           [False, False]], dtype=bool)

    It is a pairwise comparison of df['A']'s elements and [3, 6] (that's why we needed df['A'].values[: None] - for broadcasting). Since you are looking for whether it is close to any one of them in the list, we call .any(axis=1) at the end.

    For multiple columns, change the slice a little bit:

    mask = np.isclose(df[['A', 'B']].values[:, :, None], [3, 6], atol=0.5).any(axis=(1, 2))
    Out: array([False,  True,  True, False], dtype=bool)

    You can use this mask to slice the DataFrame (i.e. df[mask])

    If you want to compare df['A'] and df['B'] (and possible other columns) with different vectors, you can create two different masks:

    mask1 = np.isclose(df['A'].values[:, None], [1, 2, 3], atol=.5).any(axis=1)
    mask2 = np.isclose(df['B'].values[:, None], [4, 5], atol=.5).any(axis=1)
    mask3 = ...

    Then slice:

    df[mask1 & mask2]  # or df[mask1 & mask2 & mask3 & ...]