Search code examples
pythonpandasdataframe

How to select rows that display some type of pattern in Python


I am looking to extract rows from my dataset based on a pattern like condition. The condition I'm looking for is finding periods in a battery's charging history where it discharged from 100-0% without charging in between.

For example, in this dataset below I would be interested in a function that would only return timestamp of 7 to 12 as it contains a full discharge of the battery.

timestamp Charge level (%)
1 50
2 55
3 40
4 60
5 80
6 100
7 100
8 85
9 60
10 55
11 40
12 0
13 20

The approach I have tried is to use the loc function in Pandas to look for rows with a charge level of 0% and then backtrack until I reach a row with a charge level of 100%. But I am struggling with the backtracking part in this approach.


Solution

  • [Code updated according to comments]

    The idea I use is to keep only the rows with 0 and 100 and the final rows of interest will be the ones with 100 followed by 0. [after checking with .diff(1) that the values are monotonically decreasing]

    I also updated your example to include some more difficult cases like when it start discharging and then start charging before it was fully discharged [or fully discharged].

    with pd.option_context('display.max_columns', None):
        display(df.T)
    
        0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24
    time    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25
    level   20  0   40  60  80  100 100 85  60  40  40  0   100 20  20  55  100 10  100 0   100 20  40  0   20
    
    df["diff1"] = df["level"].diff(1)
    df.set_index("time", inplace=True)
    res = df.copy()
    
    mask =((res["level"] == 100) | (res["level"] == 0)) 
    res = res.loc[mask]
    
    mask1 = (res["level"] == 100) & (res["level"].shift(-1) == 0)
    mask2 = (res["level"] == 0) & (res["level"].shift(1) == 100)
    res = res.loc[mask1 | mask2, ["level"]]
    
        level
    time    
    7   100
    12  0
    19  100
    20  0
    21  100
    24  0
    
    # Get the start and end of all the segments having levels [100, ..., 0]
    t_start, t_end = res[res["level"] == 100].index, res[res["level"] == 0].index
     
    # Among all the segments keep only the ones with level monotonically decreasing
    idx = [(t_start[i], t_end[i]) for i in range(len(t_start)) 
           if (df.loc[t_start[i]+1:t_end[i], "diff1"] <= 0).all()]
    print(idx)  # All resulting groups with (start_time, end_time)
    
    [(7, 12), (19, 20)]
    
    # Example to get the sum of all the levels in result group n°0 (idx[0] ie (7, 12))
    gp0 = df.loc[idx[0][0]:idx[0][1], ["level"]]
    display(gp0)
    
        level
    time    
    7   100
    8   85
    9   60
    10  40
    11  40
    12  0
    
    print(f"The sum of levels is {sum(gp0['level'])}.")
    The sum of levels is 325.