How precisely works shift() in this code. I'm trying to get some True
values in my dataframe and then to extend my selection to the next up and down 4 False
values.
An example of my DataFrame:
df
Out[89]:
TRACK_ID FRAME match
290 1667.0 350.0 False
291 1667.0 352.0 False
292 1667.0 353.0 False
293 1667.0 354.0 False
294 1668.0 348.0 False
295 1668.0 349.0 False
296 1668.0 350.0 False
297 1668.0 351.0 True
298 1668.0 352.0 True
299 1668.0 353.0 True
300 449.0 87.0 False
301 449.0 88.0 False
302 449.0 89.0 False
303 449.0 90.0 False
304 449.0 91.0 False
305 449.0 92.0 False
I'm using this line of code to extract True rows and immediately 4 up and down rows:
df1 = df[df.match | df.match.shift(np.round(-4,0)) | df.match.shift(np.round(4,0))]
However my output is skipping (deleting) the first index up and down (index 296 and 300):
df1
Out[97]:
TRACK_ID FRAME match
293 1667.0 354.0 False
294 1668.0 348.0 False
295 1668.0 349.0 False
297 1668.0 351.0 True
298 1668.0 352.0 True
299 1668.0 353.0 True
301 449.0 88.0 False
302 449.0 89.0 False
303 449.0 90.0 False
I can not figure it out why is this happening, any suggestions welcome!
Your condition is a mask that "shifts" values by 4 rows, nothing more.
df.match | df.match.shift(np.round(-4,0)) | df.match.shift(np.round(4,0))
#290 False
#291 False
#292 False
#293 True
#294 True
#295 True
#296 False
#297 True
#298 True
#299 True
#300 False
#301 True
#302 True
#303 True
#304 False
#305 False
You are filtering the dataframe with it, so it is "deleting" the rows where your condition is not true. It sounds like you'd rather mark these rows as false, in which case you don't want to filter, you want to update the dataframe
df['updated_match'] = df.match | df.match.shift(np.round(-4,0)) | df.match.shift(np.round(4,0))
Then df looks like:
TRACK_ID FRAME match updated_match
290 1667.0 350.0 False False
291 1667.0 352.0 False False
292 1667.0 353.0 False False
293 1667.0 354.0 False True
294 1668.0 348.0 False True
295 1668.0 349.0 False True
296 1668.0 350.0 False False
297 1668.0 351.0 True True
298 1668.0 352.0 True True
299 1668.0 353.0 True True
300 449.0 87.0 False False
301 449.0 88.0 False True
302 449.0 89.0 False True
303 449.0 90.0 False True
304 449.0 91.0 False False
305 449.0 92.0 False False
EDIT: Re-read the question and realized your issue.
I think instead of using shift()
, you want a rolling maximum over a 4-row window. Use .rolling()
in both directions (forward and backward).
df1 =
df[df.match |
df['match'].iloc[::-1].rolling(window=4).max().fillna(0).astype(bool) |
df['match'].rolling(window=4).max().fillna(0).astype(bool)
]
Output:
# TRACK_ID FRAME match
#294 1668.0 348.0 False
#295 1668.0 349.0 False
#296 1668.0 350.0 False
#297 1668.0 351.0 True
#298 1668.0 352.0 True
#299 1668.0 353.0 True
#300 449.0 87.0 False
#301 449.0 88.0 False
#302 449.0 89.0 False
This keeps 296 and 300 which you called out.