I have a Pandas dataframe, df_next, that is a monthly aggregation of crime type incidents for specific jurisdictions. For example, something like:
ID | Year_Month | Total |
---|---|---|
AL0010000 | 1991-01 | 2024 |
AL0010000 | 1991-02 | 3017 |
... | ... | ... |
WV0550300 | 2018-11 | 30147 |
WV0550300 | 2018-12 | 32148 |
I want to reduce the size of my dataframe by removing rows that are part of 4 months of consecutive 0 values in the 'Total' column. In other words, if a ID has reported 0 total crime for four consecutive months, I want to remove that chunk of 4 months. I want to do this for all IDs.
I've tried:
# Define a window size of 4
window_size = 4
# Apply a rolling window to the Total column for each ID
df_next['Total_rolling'] = df_next.groupby('ID')['Total'].rolling(window=window_size).reset_index(0, drop=True)
df_next['Remove'] = ((df_next['Total_rolling'].shift(window_size - 1) == 0) & (df_next['Total_rolling'] == 0))
# Filter out the rows where there are four consecutive 0's in the Total value for each ID
df_filtered = df_next[~df_next['Remove']]
However, when I check df_filtered, I still have multiple examples of IDs with four consecutive months of 0 crime totals. Any help would be greatly appreciated.
# is total zero?
m = df['Total'] == 0
# create a counter to identify different
# blocks of consecutive zero's
b = (~m).cumsum()
# Group the rows where total is zero by `ID` and above blocks
# and transform with size to calculate the number of consecutive zeros
s = df[m].groupby(['ID', b]).transform('size')
# Drop the rows from the original dataframe where
# there are 4 or more consecutive zeros
df = df.drop(s.index[s >= 4])