Search code examples
pythonpandasapply

Apply varying function for pandas dataframe depending on column arguments being passed


I would like to apply a function to each row of a pandas dataframe. Instead of the argument being variable across rows, it's the function itself that is different for each row depending on the values in its columns. Let's be more concrete:

import pandas as pd 
from scipy.interpolate import interp1d

d = {'col1': [1, 2], 'col2': [2, 4], 'col3': [3, 6]}
df = pd.DataFrame(data=d)
col1 col2 col3
0 1 2 3
1 2 4 6

Now, what I would like to achieve is to extrapolate columns 1 to 3 row-wise. For the first row, this would be:

f_1 =interp1d(range(df.shape[1]), df.loc[0], fill_value='extrapolate')

with the extrapolated value f_1(df.shape[1]).item() = 4.0.

So the column I would like to add would be:

col4
4
8

I've tried something like following:

import numpy as np
def interp_row(row):
    n = row.shape[1]
    fun = interp1d(np.arange(n), row, fill_value='extrapolate')
    return fun(n+1).item()

df['col4'] = df.apply(lambda row: interp_row(row))

Can I make this work?


Solution

  • You were almost there:

    import pandas as pd 
    from scipy.interpolate import interp1d
    import numpy as np
    
    d = {'col1': [1, 2], 'col2': [2, 4], 'col3': [3, 6]}
    df = pd.DataFrame(data=d)
    
    def interp_row(row):
        n = row.shape[0]
        fun = interp1d(np.arange(n), row, fill_value='extrapolate')
        return fun(n).item()
    
    df['col4'] = df.apply(lambda row: interp_row(row), axis=1)
    print(df)
    
    

    which returns:

    
     col1  col2  col3  col4
    0     1     2     3   4.0
    1     2     4     6   8.0