Search code examples
pythonpandasdataframecalculated-columns

Creating an accumulative column based on another column that only accumulates for a new ID


I have the following dataset:

Time = ['00:01', '00:02','00:03','00:01','00:02','00:03','00:01','00:02','00:03']
ID = [1, 1, 1, 2, 2, 2, 3, 3, 3]
Value = [3.5, 3.5, 3.5, 4.1, 4.1, 4.1, 2.3, 2.3, 2.3]   
df = pd.DataFrame({'Time':Time, 'ID':ID, 'Value':Value})

Each value is the same for each ID. I want to create a new column that adds up Value column accumulatively, but only when each ID changes. V

So instead of getting

3.5   7   10.5   14.6   18.7   22.8   25.1   27.3   29.5

I want

3.5   3.5   3.5   7.6   7.6   7.6   9.9   9.9   9.9

Solution

  • using .loc to assign your value,

    shift to test where ID changes

    and then cumsum with ffill

    df.loc[:, "Val"] = df[df["ID"].ne(df["ID"].shift())][
        "Value"
    ].cumsum()
    
    df['Val'] = df['Val'].ffill()
    
    print(df)
    
        Time  ID  Value  Val
    0  00:01   1    3.5  3.5
    1  00:02   1    3.5  3.5
    2  00:03   1    3.5  3.5
    3  00:01   2    4.1  7.6
    4  00:02   2    4.1  7.6
    5  00:03   2    4.1  7.6
    6  00:01   3    2.3  9.9
    7  00:02   3    2.3  9.9
    8  00:03   3    2.3  9.9
    

    or more simply as suggested by Ch3steR

    df['Value'].where(df['Value'].ne(df['Value'].shift(1))).cumsum().ffill()
    
    0    3.5
    1    3.5
    2    3.5
    3    7.6
    4    7.6
    5    7.6
    6    9.9
    7    9.9
    8    9.9