Assume I have a pandas DataFrame that only consists of 0
and 1
depending if an anomaly was detected or not:
input_data = pd.DataFrame(data={'my_event': [0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1.]},
index=pd.date_range(start='2023-01-01 00:00:00', end='2023-01-01 00:00:10', freq='s'))
Now I would like to fill gaps in the detection depending on their size. E.g. I only want to fill gaps that are 2 seconds or shorter. What is the correct way to do something like this?
I found these questions here: 1, 2, 3 but the solutions seem to be not very straight forward. It kinda feels like there should be a simpler way to solve an issue like this.
EDIT
Sorry for the unprecise question! So a "gap" would in my case be a short time period where no anomaly was detected inside a larger time range that was detected as an anomaly.
For the example input_data
the expected output would be a DataFrame with the following data
[0., 0., 1., 1., 1., 1., 0., 0., 0., 1., 1.]
So in this example the single 0.
inside the region of ones was replaced by a one. Obviously all zeros could also be replaced by nans, if that would help. I just need to be able to specify the length of the gap that should be filled.
i dont know if i understood you well, but to fill gaps in the detection that are 2 seconds or shorter, you can do this :
import pandas as pd
input_data = pd.DataFrame(data={'my_event': [0., 0., 1., 1., 0., 1., 0., 0., 0., 1., 1.]},
index=pd.date_range(start='2023-01-01 00:00:00', end='2023-01-01 00:00:10', freq='s'))
# Find consecutive sequences of 1's
sequences = (input_data['my_event'] == 1).cumsum()
# Calculate the time difference between consecutive events
time_diff = input_data.index.to_series().diff().dt.total_seconds()
# Find the gaps shorter than 2 seconds
gaps = (sequences != sequences.shift(-1)) & (time_diff <= 2)
# Fill the gaps with 1's
input_data['my_event'][gaps] = 1
print(input_data)
my_event
2023-01-01 00:00:00 0.0
2023-01-01 00:00:01 0.0
2023-01-01 00:00:02 1.0
2023-01-01 00:00:03 1.0
2023-01-01 00:00:04 1.0
2023-01-01 00:00:05 1.0
2023-01-01 00:00:06 0.0
2023-01-01 00:00:07 0.0
2023-01-01 00:00:08 0.0
2023-01-01 00:00:09 1.0
2023-01-01 00:00:10 1.0