Search code examples
pythonpandasnumpyapply

apply a function on a dataframe in a rolling fashion using values from two columns and previous rows


Let's say I have the following dataframe (which in reality is much bigger hence the method should be fast):

df = pd.DataFrame({"distance1": [101, 102, 103], "distance2":[12, 33, 44]})

    distance1  distance2
0   12           101
1   33           102
2   44           103

Now I want to apply following function on this dataframe

def distance(x):
    return np.sqrt(np.power(x.loc[n, "distance1"] - x.loc[n-1 ,"distance1"], 2) + np.power(x.loc[n, "distance2"] - x.loc[n-1 ,"distance2"], 2))

data["dist"] = data.apply(distance, axis=1)

Where essentially I would calculate the euclidian distance between the distance1 and distance2 and n is the current row, and n-1 is the previous row in the dataframe


Solution

  • You could do this the following way:

    import numpy as np
    import pandas as pd
    
    # Example data
    df = pd.DataFrame({"distance1": [101, 102, 103], "distance2":[12, 33, 44]})
    df['dist'] = np.sqrt((df['distance1'] - df['distance1'].shift(1))**2 + (df['distance2'] - df['distance2'].shift(1))**2)
    
    df.loc[0, 'dist'] = np.nan
    
    print(df)
    

    which would give you:

       distance1  distance2       dist
    0        101         12        NaN
    1        102         33  21.023796
    2        103         44  11.045361