Search code examples
pythonpandasfunctionlambdaapply

Apply a function row by row using other dataframes' rows as list inputs in python


I'm trying to apply a function row-by-row which takes 5 inputs, 3 of which are lists. I want these lists to come from each row of 3 correspondings dataframes.

I've tried using 'apply' and 'lambda' as follows:

sol['tf_dd']=sol.apply(lambda tsol, rfsol, rbsol: 
                           taurho_difdif(xy=xy,
                                         l=l,
                                         t=tsol,
                                         rf=rfsol,
                                         rb=rbsol),
                           axis=1)

However I get the error <lambda>() missing 2 required positional arguments: 'rfsol' and 'rbsol'

The DataFrame sol and the DataFrames tsol, rfsol and rbsol all have the same length. For each row, I want the entire row from tsol, rfsol and rbsol to be input as three lists.

Here is much simplified example (first with single lists, which I then want to replicate row by row with dataframes):

The output with single lists is a single value (120). With dataframes as inputs I want an output dataframe of length 10 where all values are 120.

t=[1,2,3,4,5]
rf=[6,7,8,9,10]
rb=[11,12,13,14,15]

def simple_func(t, rf, rb):
    x=sum(t)
    y=sum(rf)
    z=sum(rb)

    return x+y+z

out=simple_func(t,rf,rb)

# dataframe rows as lists
tsol=pd.DataFrame((t,t,t,t,t,t,t,t,t,t))
rfsol=pd.DataFrame((rf,rf,rf,rf,rf,rf,rf,rf,rf,rf))
rbsol=pd.DataFrame((rb,rb,rb,rb,rb,rb,rb,rb,rb,rb))


out2 = pd.DataFrame(index=range(len(tsol)), columns=['output'])
out2['output'] = out2.apply(lambda tsol, rfsol, rbsol:
                            simple_func(t=tsol.tolist(),
                                        rf=rfsol.tolist(),
                                        rb=rbsol.tolist()),
                            axis=1)

Solution

  • Try to use "name" field in Series Type to get index value, and then get the same index for the other DataFrame

    import pandas as pd
    import numpy as np
    
    
    def postional_sum(inot, df1, df2, df3):
        """
            Get input index and gather the same position for the other DataFrame collection
        """
    
        position = inot.name
    
        x = df1.iloc[position].sum()
        y = df2.iloc[position].sum()
        z = df3.iloc[position].sum()
        return x + y + z
    
    
    # dataframe rows as lists
    tsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
    rfsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
    rbsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
    
    out2 = pd.DataFrame(index=range(len(tsol)), columns=["output"])
    
    out2["output"] = out2.apply(lambda x: postional_sum(x, tsol, rfsol, rbsol), axis=1)
    
    out2
    

    Hope this helps!