Search code examples
pythonpandasscipynumerical-integration

Pandas integrate over columns per each row


In a simplified dataframe:

import pandas as pd 

df1 = pd.DataFrame({'350': [7.898167, 6.912074, 6.049002, 5.000357, 4.072320],
                '351': [8.094912, 7.090584, 6.221289, 5.154516, 4.211746],
                '352': [8.291657, 7.269095, 6.393576, 5.308674, 4.351173],
                '353': [8.421007, 7.374317, 6.496641, 5.403691, 4.439815],
                '354': [8.535562, 7.463452, 6.584512, 5.485725, 4.517310],
                '355': [8.650118, 7.552586, 6.672383, 4.517310, 4.594806]},
                 index=[1, 2, 3, 4, 5])

int_range = df1.columns.astype(float)
a = 0.005
b = 0.837

I would like to solve an equation which is attached as an image below:

enter image description here

I is equal to the values in the data frame. x is the int_range values so in this case from 350 to 355 with a dx=1. a and b are optional constants

I need to get a dataframe as an output per each row

For now I do something like this, but I'm not sure it's correct:

dict_INT = {}
for index, row in df1.iterrows():

    func = df1.loc[index]*df1.loc[index].index.astype('float')
    x    = df1.loc[index].index.astype('float')

    dict_INT[index] = integrate.trapz(func, x)

df_out = pd.DataFrame(dict_INT, index=['INT']).T

df_fin = df_out/(a*b)

This is the final sum I get per row:

1  3.505796e+06
2  3.068796e+06
3  2.700446e+06
4  2.199336e+06
5  1.840992e+06

Solution

  • I solved this by first converting the dataframe to dict and then performing your equation by each item in row, then writing these value to dict using collections defaultdict. I will break it down:

    import pandas as pd
    from collections import defaultdict
    
    df1 = pd.DataFrame({'350': [7.898167, 6.912074, 6.049002, 5.000357, 4.072320],
                    '351': [8.094912, 7.090584, 6.221289, 5.154516, 4.211746],
                    '352': [8.291657, 7.269095, 6.393576, 5.308674, 4.351173],
                    '353': [8.421007, 7.374317, 6.496641, 5.403691, 4.439815],
                    '354': [8.535562, 7.463452, 6.584512, 5.485725, 4.517310],
                    '355': [8.650118, 7.552586, 6.672383, 4.517310, 4.594806]},
                    index=[1, 2, 3, 4, 5]
                     )
    
    int_range = df1.columns.astype(float)
    a = 0.005
    b = 0.837
    dx = 1
    df_dict = df1.to_dict() # convert df to dict for easier operations
    
    integrated_dict = {} # initialize empty dict
    
    d = defaultdict(list) # initialize empty dict of lists for tuples later
    integrated_list = []
    for k,v in df_dict.items(): # unpack df dict of dicts
        for x,y in v.items(): # unpack dicts by column and index (x is index, y is column)
            integrated_list.append((k, (((float(k)*float(y)*float(dx))/(a*b))))) #store a list of tuples.
    
    
    for x,y in integrated_list: # create dict with column header as key and new integrated calc as value (currently a tuple)
        d[x].append(y)
    
    
    d = {k:tuple(v) for k, v in d.items()} # unpack to multiple values
    
    integrated_df = pd.DataFrame.from_dict(d) # to df
    integrated_df['Sum'] = integrated_df.iloc[:, :].sum(axis=1)
    

    output (updated to include sum):

                 350            351            352            353            354  \
    0  660539.653524  678928.103226  697410.576822  710302.382557  722004.527599   
    1  578070.704898  594694.141935  611402.972521  622015.269056  631317.086738   
    2  505890.250896  521785.529032  537763.142652  547984.294624  556969.473835   
    3  418189.952210  432314.245161  446512.126165  455795.202628  464025.483871   
    4  340576.344086  353243.212903  365976.797133  374493.356033  382109.376344   
    
             355             Sum
    0  733761.502987  4.202947e+06
    1  640661.416965  3.678162e+06
    2  565996.646356  3.236389e+06
    3  383188.781362  2.600026e+06
    4  389762.516129  2.206162e+06