Search code examples
pythonpandasdataframepandas-groupbypandas-apply

Using pandas groupby and apply for cumulative integration


I have a pandas DataFrame with columns idx, grp, X, Y, and I want to get a new column with the cumulative integral of a function of Y with respect to X. However, I want to apply this cumulative integration to each subgroup of the DataFrame as defined by the column grp.

Here's what I'm doing:

import numpy as np
import pandas as pd
from scipy import integrate

def myIntegral(DF, n):
    A0 = 200
    return integrate.cumtrapz((A0/DF.Y)**n, DF.X, initial=0)

data = pd.DataFrame({'idx' : [1,2,3,4,5,6],
                     'grp' : [2,2,2,2,3,3],
                     'X' : [.1,.2,.3,.4,.2,.3],
                     'Y' : [3,4,4,3,2,3]}
                    )
data.sort_values(by=['grp', 'X'], inplace=True)

out = data.groupby('grp').apply(myIntegral, n=0.5)

out is a Series of ndarrays for each value of grp, which I need to map back into the DataFrame:

data_grouped = data.groupby('grp')
out2 = []
for grp, DF in data_grouped:
   DF['Z'] = out.loc[grp]
   out2.append(DF)
data = pd.concat(out2)

It works but the step via a Series of ndarrays seems really ugly and prone to error. Suggestions how to improve this? Also, the data sets I'll be working with are rather big, so I am trying to find an efficient solution.

Thanks!


Solution

  • You can change your function for create new column and return back DF like:

    def myIntegral(DF, n):
        A0 = 200
        DF['new'] = integrate.cumtrapz((A0/DF.Y)**n, DF.X, initial=0)
        return DF
    
    data = pd.DataFrame({'idx' : [1,2,3,4,5,6],
                         'grp' : [2,2,2,2,3,3],
                         'X' : [.1,.2,.3,.4,.2,.3],
                         'Y' : [3,4,4,3,2,3]}
                        )
    data.sort_values(by=['grp', 'X'], inplace=True)
    
    out = data.groupby('grp').apply(myIntegral, n=0.5)
    print (out)
      idx  grp    X  Y       new
    0    1    2  0.1  3  0.000000
    1    2    2  0.2  4  0.761802
    2    3    2  0.3  4  1.468908
    3    4    2  0.4  3  2.230710
    4    5    3  0.2  2  0.000000
    5    6    3  0.3  3  0.908248