I have a dataframe with several columns - for simplicity, column A is a column of integers that are strictly increasing.
A B ...
103
222
383
432
799
1089
...
I would like to filter the dataframe based on a threshold value for column A, e.g. 750. I can do something like df[df['A'] < 750]
to achieve this. This results in:
A B ...
103
222
383
432
...
However, I would like to include the value directly after this threshold value as well - e.g. the resulting dataframe also should contain the 799 row.
A B ...
103
222
383
432
799
...
A clunky way to do this is just to pd.concat
these two dataframes df[df['A'] < 750]
and df[df['A'] >= 750].iloc[0]
together - it is essentially the original filter plus the single row directly after 750, which is 799, hence the iloc[0]
. Is there some sort of built in method achieve this result, without having to do this manual concat?
You could use Series.searchsorted
assuming the dataframe is sorted by A
values. And this should be very fast.
By default the side is left
for searchsorted and because you first want to filter < 750 first and then include the next higher element to 750 then side=left should be good.
df.loc[:df['A'].searchsorted(750)]
Output (with made up 'B' values):
A B
0 103 1
1 222 2
2 383 3
3 432 4
4 799 5