Search code examples
pythonpandascumsum

pd conditional cumsum and percentage


I have a dataframe of the following shape:

df = pd.DataFrame()
df["trial"] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
df["correct"] = [1, 0, 1, 1, 0, 0, 0, 1, 0, 1]
df["responding_subject"] = ["one", "two", "two", "two", "one", "two", "one", "one", "one", "two"]

I want to add 2 new columns that represent the accuracy (i.e. proportion correct) of one's own trials up to this point.

E.g. df["acc_one"] at index 4 would compute the accuracy of all previous (trials 0-3) own (but not partner!) trials.


Solution

  • IIUC, you can compute the expanding accuracy using groupby:

    g = df.groupby('responding_subject')['correct']
    df['accuracy'] = g.cumsum()/(g.cumcount()+1)
    

    output:

       trial  correct responding_subject  accuracy
    0      0        1                one  1.000000
    1      1        0                two  0.000000
    2      2        1                two  0.500000
    3      3        1                two  0.666667
    4      4        0                one  0.500000
    5      5        0                two  0.500000
    6      6        0                one  0.333333
    7      7        1                one  0.500000
    8      8        0                one  0.400000
    9      9        1                two  0.600000
    

    There is no real need to split accuracies in different columns, however if you really want to, add a pivot step:

    df.join(df.pivot(columns='responding_subject', values='accuracy').add_prefix('acc_'))
    

    output:

       trial  correct responding_subject  accuracy   acc_one   acc_two
    0      0        1                one  1.000000  1.000000       NaN
    1      1        0                two  0.000000       NaN  0.000000
    2      2        1                two  0.500000       NaN  0.500000
    3      3        1                two  0.666667       NaN  0.666667
    4      4        0                one  0.500000  0.500000       NaN
    5      5        0                two  0.500000       NaN  0.500000
    6      6        0                one  0.333333  0.333333       NaN
    7      7        1                one  0.500000  0.500000       NaN
    8      8        0                one  0.400000  0.400000       NaN
    9      9        1                two  0.600000       NaN  0.600000