Search code examples
pythonpandasdataframenumpydata-fitting

Pandas new column using equation


I have the following dataset:

df = pd.DataFrame ({'index': [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26], 
'avg': [130, np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN, 135, np.NaN, np.NaN,np.NaN,np.NaN,np.NaN, 136, np.NaN,np.NaN],
 'slope':[.02,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN, .08,np.NaN, np.NaN,np.NaN,np.NaN,np.NaN, .03, np.NaN,np.NaN] })

I want to create new column 'fit' which will fit linear equation in between the occurrence of integer value in 'avg'. I have written the following code:

df.loc[0,'fit'] = df.loc [0,'avg']
def fitt ():
    for i in range (0, len(df)):
        if df.loc [i,'avg'] > 0:
            a = df.loc[i,'index']
            b = df.loc [i,'slope']
            c= df.loc [i,'avg']
            df.loc [i,'fit'] = df.loc [i, 'avg']
            continue
        while df.loc [i,'avg'] == np.NaN:
            df.loc[i,'fit'] = c + b * (i-a)
            
    return df

The output column 'fit' should contain the following value:

df['fit]= [130,130.02,130.04,130.06,130.08,130.10,130.12,130.14,135,135.08,135.16,135.24,135.32,135.40,136, 136.03,136.06]

I was wondering how to get the correct code. Any help is much appreciated


Solution

  • If you first propagate the slope to all subsequent missing values, you can easily calculate the 'fit' values step by step, just adding the slope to the previous value cumulatively:

    df['slope'] = df.slope.fillna(method='ffill')
    fit = df.avg.values.copy()
    missing = df.avg.isna()
    
    for i in range(len(df)):
        if missing[i]:
            fit[i] = fit[i - 1] + df.slope[i]
            
    df['fit'] = fit