I have the following dataset:
df = pd.DataFrame ({'index': [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26],
'avg': [130, np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN, 135, np.NaN, np.NaN,np.NaN,np.NaN,np.NaN, 136, np.NaN,np.NaN],
'slope':[.02,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN, .08,np.NaN, np.NaN,np.NaN,np.NaN,np.NaN, .03, np.NaN,np.NaN] })
I want to create new column 'fit' which will fit linear equation in between the occurrence of integer value in 'avg'. I have written the following code:
df.loc[0,'fit'] = df.loc [0,'avg']
def fitt ():
for i in range (0, len(df)):
if df.loc [i,'avg'] > 0:
a = df.loc[i,'index']
b = df.loc [i,'slope']
c= df.loc [i,'avg']
df.loc [i,'fit'] = df.loc [i, 'avg']
continue
while df.loc [i,'avg'] == np.NaN:
df.loc[i,'fit'] = c + b * (i-a)
return df
The output column 'fit' should contain the following value:
df['fit]= [130,130.02,130.04,130.06,130.08,130.10,130.12,130.14,135,135.08,135.16,135.24,135.32,135.40,136, 136.03,136.06]
I was wondering how to get the correct code. Any help is much appreciated
If you first propagate the slope to all subsequent missing values, you can easily calculate the 'fit' values step by step, just adding the slope to the previous value cumulatively:
df['slope'] = df.slope.fillna(method='ffill')
fit = df.avg.values.copy()
missing = df.avg.isna()
for i in range(len(df)):
if missing[i]:
fit[i] = fit[i - 1] + df.slope[i]
df['fit'] = fit