I am looking to extract rows from my dataset based on a pattern like condition. The condition I'm looking for is finding periods in a battery's charging history where it discharged from 100-0% without charging in between.
For example, in this dataset below I would be interested in a function that would only return timestamp of 7 to 12 as it contains a full discharge of the battery.
timestamp | Charge level (%) |
---|---|
1 | 50 |
2 | 55 |
3 | 40 |
4 | 60 |
5 | 80 |
6 | 100 |
7 | 100 |
8 | 85 |
9 | 60 |
10 | 55 |
11 | 40 |
12 | 0 |
13 | 20 |
The approach I have tried is to use the loc
function in Pandas to look for rows with a charge level of 0% and then backtrack until I reach a row with a charge level of 100%.
But I am struggling with the backtracking part in this approach.
[Code updated according to comments]
The idea I use is to keep only the rows with 0 and 100 and the final rows of interest will be the ones with 100 followed by 0. [after checking with .diff(1) that the values are monotonically decreasing]
I also updated your example to include some more difficult cases like when it start discharging and then start charging before it was fully discharged [or fully discharged].
with pd.option_context('display.max_columns', None):
display(df.T)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
level 20 0 40 60 80 100 100 85 60 40 40 0 100 20 20 55 100 10 100 0 100 20 40 0 20
df["diff1"] = df["level"].diff(1)
df.set_index("time", inplace=True)
res = df.copy()
mask =((res["level"] == 100) | (res["level"] == 0))
res = res.loc[mask]
mask1 = (res["level"] == 100) & (res["level"].shift(-1) == 0)
mask2 = (res["level"] == 0) & (res["level"].shift(1) == 100)
res = res.loc[mask1 | mask2, ["level"]]
level
time
7 100
12 0
19 100
20 0
21 100
24 0
# Get the start and end of all the segments having levels [100, ..., 0]
t_start, t_end = res[res["level"] == 100].index, res[res["level"] == 0].index
# Among all the segments keep only the ones with level monotonically decreasing
idx = [(t_start[i], t_end[i]) for i in range(len(t_start))
if (df.loc[t_start[i]+1:t_end[i], "diff1"] <= 0).all()]
print(idx) # All resulting groups with (start_time, end_time)
[(7, 12), (19, 20)]
# Example to get the sum of all the levels in result group n°0 (idx[0] ie (7, 12))
gp0 = df.loc[idx[0][0]:idx[0][1], ["level"]]
display(gp0)
level
time
7 100
8 85
9 60
10 40
11 40
12 0
print(f"The sum of levels is {sum(gp0['level'])}.")
The sum of levels is 325.