Search code examples

Pandas for each new value in a column, remove the following two rows

I have the following dataframe:

time   alarm
0       0
1       1
2       0
3       1
4       1
5       1
6       1
7       0
8       0
9       1
10      0

The column alarm represents an alarm. If it rings, it takes value 1.
Each time the alarm rings, I want to "silence" the next two rows. Then, if it rings again after the silenced period, I want to silence the next two rows, and so on.

In other words, I want to obtain the following dataframe:

time   alarm    silenced
0       0       no
1       1       no
2       0       yes
3       1       yes
4       1       no
5       1       yes
6       1       yes
7       0       no
8       0       no
9       1       no
10      0       yes

I managed to do it using a for loop or a lambda function, but I have to speed up the computation.
Can somebody help me? Thank you in advance!

P.S. Since I will eventually remove the "silenced" rows, a solution that directly removes such rows will also be accepted. In such case, the result should be:

time   alarm
0       0
1       1
4       1
7       0
8       0
9       1

MY ATTEMPT using a for loop in an auxiliary function:

import numpy as np
import pandas as pd

df = pd.DataFrame({"time":[0,1,2,3,4,5,6,7,8,9,10], "alarm":[0,1,0,1,1,1,1,0,0,1,0]})

def fun_silence(df):
    # bool: if True,  we are in a "silent" period 
    #       if False, we can consider the current time as a possible alarm
    flag_silent = False
    # time of the *last* alarm
    alarm_time = np.nan
    # loop over rows
    for index, row in df.iterrows():
        # if we are in a silent period
        if flag_silent:
            # if 2 time steps passed from the last alarm, we end the silent period
            if row['time'] - alarm_time > 2:
                flag_silent = False
            # otherwise, we mark this row as silenced
      [index, 'silenced'] = 1
        # if there is an alarm and we are not in a silent period
        if row['alarm'] == 1 and not flag_silent:
            # save the alarm time
            alarm_time = row['time']
            # enter in a silent period
            flag_silent = True
    return df
df['silenced'] = 0
df_silenced = fun_silence(df)


  • I think you can not avoid the for-loop in this problem but you can certainly optimize the function and then compile it using numba to achieve C like speed on large datasets

    from numba import njit
    def silence(alarm):
        count = 0
        for a in alarm:
            if count > 0:
                yield True
                count -= 1
            elif count == 0 and a == 1:
                count = 2
                yield False
                yield False
    df['silenced'] = [*silence(df['alarm'].to_numpy())]

        time  alarm  silenced
    0      0      0     False
    1      1      1     False
    2      2      0      True
    3      3      1      True
    4      4      1     False
    5      5      1      True
    6      6      1      True
    7      7      0     False
    8      8      0     False
    9      9      1     False
    10    10      0      True