Search code examples
pythonpandasdataframenumpyauc

Area under curve of a dataframe column


I'm trying to calculate the cumulative AUC of a dataframe values from first row to the current row.

Ex:

points AUC
0 0 0
1 1 0.5
2 2 1
3 3 4.5
4 4 8
5 5 12.5
6 4 17
7 0 19
8 -2 18
9 -2 16

I can use np.trapz() but I have to calculate it row by row, by a for loop.

for i in df.index:
    row={"AUC" : trapz(df["points"].iloc[:i])}
    df["AUC"].iloc[i]=row

Is there any way to apply it to the whole column without using a for loop?

The second problem is that my dataframe gets updated every minutes so either I have to calculate this cumulative AUC from the beginning of the df which makes the calculation longer and longer, or choose a part of the df (ex: df.tail(25)) and apply a function to it, and by doing this I would lose calculate AUC of the curve before iloc[-25].


Solution

  • I would try something like this:

    np.cumsum(df.points)-np.concatenate(([0], np.cumsum(np.diff(df.points)/2)), axis=0)
    

    here is a working example: https://abstra.show/dezL0ASX4s