Flanking data by using shift

How precisely works shift() in this code. I'm trying to get some True values in my dataframe and then to extend my selection to the next up and down 4 False values. An example of my DataFrame:

df Out[89]: TRACK_ID FRAME match 290 1667.0 350.0 False 291 1667.0 352.0 False 292 1667.0 353.0 False 293 1667.0 354.0 False 294 1668.0 348.0 False 295 1668.0 349.0 False 296 1668.0 350.0 False 297 1668.0 351.0 True 298 1668.0 352.0 True 299 1668.0 353.0 True 300 449.0 87.0 False 301 449.0 88.0 False 302 449.0 89.0 False 303 449.0 90.0 False 304 449.0 91.0 False 305 449.0 92.0 False

I'm using this line of code to extract True rows and immediately 4 up and down rows:

df1 = df[df.match | df.match.shift(np.round(-4,0)) | df.match.shift(np.round(4,0))]

However my output is skipping (deleting) the first index up and down (index 296 and 300):

df1 Out[97]: TRACK_ID FRAME match 293 1667.0 354.0 False 294 1668.0 348.0 False 295 1668.0 349.0 False 297 1668.0 351.0 True 298 1668.0 352.0 True 299 1668.0 353.0 True 301 449.0 88.0 False 302 449.0 89.0 False 303 449.0 90.0 False

I can not figure it out why is this happening, any suggestions welcome!

Solution

Your condition is a mask that "shifts" values by 4 rows, nothing more.

df.match | df.match.shift(np.round(-4,0)) | df.match.shift(np.round(4,0))
#290    False
#291    False
#292    False
#293     True
#294     True
#295     True
#296    False
#297     True
#298     True
#299     True
#300    False
#301     True
#302     True
#303     True
#304    False
#305    False

You are filtering the dataframe with it, so it is "deleting" the rows where your condition is not true. It sounds like you'd rather mark these rows as false, in which case you don't want to filter, you want to update the dataframe

df['updated_match'] = df.match | df.match.shift(np.round(-4,0)) | df.match.shift(np.round(4,0))

Then df looks like:

   TRACK_ID FRAME   match   updated_match
290 1667.0  350.0   False   False
291 1667.0  352.0   False   False
292 1667.0  353.0   False   False
293 1667.0  354.0   False   True
294 1668.0  348.0   False   True
295 1668.0  349.0   False   True
296 1668.0  350.0   False   False
297 1668.0  351.0   True    True
298 1668.0  352.0   True    True
299 1668.0  353.0   True    True
300 449.0   87.0    False   False
301 449.0   88.0    False   True
302 449.0   89.0    False   True
303 449.0   90.0    False   True
304 449.0   91.0    False   False
305 449.0   92.0    False   False

EDIT: Re-read the question and realized your issue.

I think instead of using shift(), you want a rolling maximum over a 4-row window. Use .rolling() in both directions (forward and backward).

df1 = 
df[df.match |
   df['match'].iloc[::-1].rolling(window=4).max().fillna(0).astype(bool) |
   df['match'].rolling(window=4).max().fillna(0).astype(bool)
  ]

Output:

#   TRACK_ID    FRAME   match
#294    1668.0  348.0   False
#295    1668.0  349.0   False
#296    1668.0  350.0   False
#297    1668.0  351.0   True
#298    1668.0  352.0   True
#299    1668.0  353.0   True
#300    449.0   87.0    False
#301    449.0   88.0    False
#302    449.0   89.0    False

This keeps 296 and 300 which you called out.