Search code examples
pythonpandaslistnumpygroup-by

How to reset a Sum after it reaches a milestone and then continue it in Pandas DataFrame in Python?


I've got one DataFrame that I would like to Sum the "Total_time" column and if it reaches a certain valeu the sum resets. The Sum need to be separated by different "ac"(using groupby) and when it reaches more than 185, the Sum is reseted.

The DataFrame:

index ac flight_date Total_time
0 PR-AKA 2023-03-13 00:00:00 87
1 PR-AKA 2023-03-13 00:00:00 55
2 PR-AKA 2023-03-13 00:00:00 59
3 PR-AKA 2023-03-13 00:00:00 71
4 PR-AKA 2023-03-14 00:00:00 70
5 PR-AKA 2023-03-14 00:00:00 58
6 PR-AKA 2023-03-14 00:00:00 45
7 PR-AKA 2023-03-14 00:00:00 41
8 PR-AKA 2023-03-14 00:00:00 33
9 PR-AKA 2023-03-14 00:00:00 42
10 PR-AKA 2023-03-15 00:00:00 70
11 PR-AKB 2023-03-13 00:00:00 109
12 PR-AKB 2023-03-13 00:00:00 100
13 PR-AKB 2023-03-13 00:00:00 92
14 PR-AKB 2023-03-13 00:00:00 102
15 PR-AKB 2023-03-14 00:00:00 71
16 PR-AKB 2023-03-14 00:00:00 74
17 PR-AKB 2023-03-14 00:00:00 80
18 PR-AKB 2023-03-14 00:00:00 64
19 PR-AKB 2023-03-14 00:00:00 66
20 PR-AKB 2023-03-14 00:00:00 70
21 PR-AKB 2023-03-14 00:00:00 31
22 PR-AKB 2023-03-14 00:00:00 45
23 PR-AKB 2023-03-15 00:00:00 72
24 PR-AKB 2023-03-15 00:00:00 70
25 PR-AKB 2023-03-15 00:00:00 64
26 PR-AKB 2023-03-15 00:00:00 75
27 PR-AKB 2023-03-15 00:00:00 52
28 PR-AKB 2023-03-15 00:00:00 58
29 PR-AKB 2023-03-16 00:00:00 36
30 PR-AKB 2023-03-16 00:00:00 47
31 PR-AKB 2023-03-16 00:00:00 14
32 PR-AKC 2023-03-13 00:00:00 86
33 PR-AKC 2023-03-13 00:00:00 82
34 PR-AKC 2023-03-13 00:00:00 83

Expected result:

index ac flight_date Total_time Sum
0 PR-AKA 2023-03-13 00:00:00 87 87
1 PR-AKA 2023-03-13 00:00:00 55 142
2 PR-AKA 2023-03-13 00:00:00 59 201
3 PR-AKA 2023-03-13 00:00:00 71 71
4 PR-AKA 2023-03-14 00:00:00 70 141
5 PR-AKA 2023-03-14 00:00:00 58 199
6 PR-AKA 2023-03-14 00:00:00 45 45
7 PR-AKA 2023-03-14 00:00:00 41 86
8 PR-AKA 2023-03-14 00:00:00 33 119
9 PR-AKA 2023-03-14 00:00:00 42 161
10 PR-AKA 2023-03-15 00:00:00 70 231
11 PR-AKB 2023-03-13 00:00:00 109 109
12 PR-AKB 2023-03-13 00:00:00 100 209
13 PR-AKB 2023-03-13 00:00:00 92 92
14 PR-AKB 2023-03-13 00:00:00 102 194
15 PR-AKB 2023-03-14 00:00:00 71 71
16 PR-AKB 2023-03-14 00:00:00 74 145
17 PR-AKB 2023-03-14 00:00:00 80 225
18 PR-AKB 2023-03-14 00:00:00 64 64
19 PR-AKB 2023-03-14 00:00:00 66 131
20 PR-AKB 2023-03-14 00:00:00 70 201
21 PR-AKB 2023-03-14 00:00:00 31 31
22 PR-AKB 2023-03-14 00:00:00 45 76
23 PR-AKB 2023-03-15 00:00:00 72 148
24 PR-AKB 2023-03-15 00:00:00 70 218
25 PR-AKB 2023-03-15 00:00:00 64 64
26 PR-AKB 2023-03-15 00:00:00 75 139
27 PR-AKB 2023-03-15 00:00:00 52 191
28 PR-AKB 2023-03-15 00:00:00 58 58
29 PR-AKB 2023-03-16 00:00:00 36 94
30 PR-AKB 2023-03-16 00:00:00 47 141
31 PR-AKB 2023-03-16 00:00:00 14 155
32 PR-AKC 2023-03-13 00:00:00 86 86
33 PR-AKC 2023-03-13 00:00:00 82 83
34 PR-AKC 2023-03-13 00:00:00 83 166

....

I've marked in black where the sum must be reseted. In one question in here I've found one code that almost did it, but instead of reset the sum it put a 0 as bellow but I don't know how to fix it:

def my_accumulate(maxval):
    val = 0
    yield
    while True:
        if val < maxval:
            val += yield val
        else:
            yield val
            val = 0


def fn(x):
    a = my_accumulate(185)
    next(a)
    x["TIME_LIMIT"] = [a.send(v) for v in x["Total_time"]]
    return x


ATR = ATR.groupby('ac').apply(fn)
ATR

Result:

index ac flight_date Total_time TIME_LIMIT
0 PR-AKA 2023-03-13 00:00:00 87 0
1 PR-AKA 2023-03-13 00:00:00 55 55
2 PR-AKA 2023-03-13 00:00:00 59 114
3 PR-AKA 2023-03-13 00:00:00 71 185
4 PR-AKA 2023-03-14 00:00:00 70 0
5 PR-AKA 2023-03-14 00:00:00 58 58
6 PR-AKA 2023-03-14 00:00:00 45 103
7 PR-AKA 2023-03-14 00:00:00 41 144
8 PR-AKA 2023-03-14 00:00:00 33 177
9 PR-AKA 2023-03-14 00:00:00 42 219
10 PR-AKA 2023-03-15 00:00:00 70 0
11 PR-AKB 2023-03-13 00:00:00 109 0
12 PR-AKB 2023-03-13 00:00:00 100 100
13 PR-AKB 2023-03-13 00:00:00 92 192
14 PR-AKB 2023-03-13 00:00:00 102 0
15 PR-AKB 2023-03-14 00:00:00 71 71
16 PR-AKB 2023-03-14 00:00:00 74 145
17 PR-AKB 2023-03-14 00:00:00 80 225
18 PR-AKB 2023-03-14 00:00:00 64 0
19 PR-AKB 2023-03-14 00:00:00 66 66
20 PR-AKB 2023-03-14 00:00:00 70 136
21 PR-AKB 2023-03-14 00:00:00 31 167
22 PR-AKB 2023-03-14 00:00:00 45 212
23 PR-AKB 2023-03-15 00:00:00 72 0
24 PR-AKB 2023-03-15 00:00:00 70 70
25 PR-AKB 2023-03-15 00:00:00 64 134
26 PR-AKB 2023-03-15 00:00:00 75 209
27 PR-AKB 2023-03-15 00:00:00 52 0
28 PR-AKB 2023-03-15 00:00:00 58 58
29 PR-AKB 2023-03-16 00:00:00 36 94
30 PR-AKB 2023-03-16 00:00:00 47 141
31 PR-AKB 2023-03-16 00:00:00 14 155
32 PR-AKC 2023-03-13 00:00:00 86 0
33 PR-AKC 2023-03-13 00:00:00 82 82
34 PR-AKC 2023-03-13 00:00:00 83 165

Solution

  • You can change your generator/function to:

    def my_accumulate(s, maxval=float('inf')):
        curr = 0
        for x in s:
            curr += x         # add previous value
            yield curr
            if curr > maxval: # if above threshold
                curr = 0      # reset the count
    
    def fn(g):
        return list(my_accumulate(g, maxval=185))
            
    ATR['TIME_LIMIT'] = ATR.groupby('ac')['Total_time'].transform(fn)
    

    Output:

        index       ac           flight_date  Total_time  TIME_LIMIT
    0       0  PR-AKA   2023-03-13 00:00:00           87          87
    1       1  PR-AKA   2023-03-13 00:00:00           55         142
    2       2  PR-AKA   2023-03-13 00:00:00           59         201
    3       3  PR-AKA   2023-03-13 00:00:00           71          71
    4       4  PR-AKA   2023-03-14 00:00:00           70         141
    5       5  PR-AKA   2023-03-14 00:00:00           58         199
    6       6  PR-AKA   2023-03-14 00:00:00           45          45
    7       7  PR-AKA   2023-03-14 00:00:00           41          86
    8       8  PR-AKA   2023-03-14 00:00:00           33         119
    9       9  PR-AKA   2023-03-14 00:00:00           42         161
    10     10  PR-AKA   2023-03-15 00:00:00           70         231
    11     11  PR-AKB   2023-03-13 00:00:00          109         109
    12     12  PR-AKB   2023-03-13 00:00:00          100         209
    13     13  PR-AKB   2023-03-13 00:00:00           92          92
    14     14  PR-AKB   2023-03-13 00:00:00          102         194
    15     15  PR-AKB   2023-03-14 00:00:00           71          71
    16     16  PR-AKB   2023-03-14 00:00:00           74         145
    17     17  PR-AKB   2023-03-14 00:00:00           80         225
    18     18  PR-AKB   2023-03-14 00:00:00           64          64
    19     19  PR-AKB   2023-03-14 00:00:00           66         130
    20     20  PR-AKB   2023-03-14 00:00:00           70         200
    21     21  PR-AKB   2023-03-14 00:00:00           31          31
    22     22  PR-AKB   2023-03-14 00:00:00           45          76
    23     23  PR-AKB   2023-03-15 00:00:00           72         148
    24     24  PR-AKB   2023-03-15 00:00:00           70         218
    25     25  PR-AKB   2023-03-15 00:00:00           64          64
    26     26  PR-AKB   2023-03-15 00:00:00           75         139
    27     27  PR-AKB   2023-03-15 00:00:00           52         191
    28     28  PR-AKB   2023-03-15 00:00:00           58          58
    29     29  PR-AKB   2023-03-16 00:00:00           36          94
    30     30  PR-AKB   2023-03-16 00:00:00           47         141
    31     31  PR-AKB   2023-03-16 00:00:00           14         155
    32     32  PR-AKC   2023-03-13 00:00:00           86          86
    33     33  PR-AKC   2023-03-13 00:00:00           82         168
    34     34  PR-AKC   2023-03-13 00:00:00           83         251