Search code examples
pythonpandasdataframesumappend

Sum values in column based on values in other column and append to df - python


I have a df that looks like this: df here

The column Count gives values that are either 0 or stretches of higher numbers separated by zeros e.g. 0,0,0,0,4,4,4,4,4,4,0,0,0,0,0,0,6,6,6,6,6,6,6,0,0,0,0.

I would like to sum the values of the column "angle_1frame_abs" for each stretch and add it as new column to the df e.g.

Count: 0,0,0,0,4,4,4,4,4,4,...

angle_1frame_abs: 2,1,3,4,2,2,2,3,4,3,...

new column: 10,10,10,10,16,16,16,16,16,16,...

Thank you all!


Solution

  • Seems what you need is to divide your df into segments with the same consecutive value of Count, and sum over the angle_1frame_abs within each segment, and copy the sum of each segment to each row of the segment.

    In your case we cannot use Count as the key for groupby, so we need a new one:

    new_key = (df['Count'].diff().abs() > 0).cumsum()
    

    Using the new key, we can do the sum:

    the_sum = df.groupby( new_key )['angle_1frame_abs'].sum()
    

    And copy the summation results back to the table by the new key

    df['NewColumn'] = new_key.map(the_sum)
    

    Note that we use diff() to find out the boundary of segments, and used abs() > 0 to convert boundary values to True/False or 1/0, then use cumsum to get the new key by leveraging the fact that the cumsum doesn't change outside the boundary (i.e. within the segment).

    I tested my code using the following fake data

    df = pd.DataFrame({
        'Count': [2,0,0,0,4,4,4,4,4,4,0,0,0,0,0,0,6,6,6,6,6,6,6,0,0,0,1], 
        'angle_1frame_abs': [np.nan,0,0,0,4,4,4,4,4,4,1,1,1,1,1,1,6,6,6,6,6,6,6,2,2,2,2],
    })