I tried looking up for something similar but could find it. So I have below structure dataframe. I looking to delete rows that have same score for 5 days or more. So in below case PeronID AB-123's records from 2/1 to 2/6 needs to be deleted, also for DG-3465 from 2/3 to 2/10. But nothing for TY-9456. I was thinking for using rolling() but that will delete only 2/1-2/5 for AB-123 but not 2/6.
PersonID Date Score
AB-123 2/1/2016 0
AB-123 2/2/2016 0
AB-123 2/3/2016 0
AB-123 2/4/2016 0
AB-123 2/5/2016 0
AB-123 2/6/2016 0
AB-123 2/7/2016 67.5
AB-123 2/8/2016 73.4
AB-123 2/9/2016 70.5
AB-123 2/10/2016 68
DG-3465 2/1/2016 22.5
DG-3465 2/2/2016 25.6
DG-3465 2/3/2016 36.4
DG-3465 2/4/2016 36.4
DG-3465 2/5/2016 36.4
DG-3465 2/6/2016 36.4
DG-3465 2/7/2016 36.4
DG-3465 2/8/2016 36.4
DG-3465 2/9/2016 36.4
DG-3465 2/10/2016 36.4
TY-9456 2/1/2016 0
TY-9456 2/2/2016 0
TY-9456 2/3/2016 5.23
TY-9456 2/4/2016 4.12
TY-9456 2/5/2016 5.95
TY-9456 2/6/2016 6.97
TY-9456 2/7/2016 12.45
TY-9456 2/8/2016 15.61
TY-9456 2/9/2016 15.61
TY-9456 2/10/2016 15.61
Tried few different things but I am kinda stuck with nothing poping up in my head. What would you suggest? Using python pandas by the way ;)
You can roll
on Score column, calculate the running standard deviation, and then drop rows where the standard deviations are zero along with five rows before them (this assumes you want to delete rows with same scores on consecutive days):
df.drop(np.unique(df.Score.rolling(5).std()[lambda x: x == 0].index.values - pd.np.arange(5)[:, None]))