Search code examples
pythonpandasnumpymachine-learninglinear-regression

Unable to understand a part of code about Linear Regression from sentdex tutorials on machine learning


Was following Sentdex Machine Learning Tutorials in youtube. In the 5th part he does this

forecast_out = int(math.ceil(0.01*len(df)))
print(forecast_out)

df['label'] = df[forecast_col].shift(-forecast_out)

X = np.array(df.drop(['label'],1))
X = preprocessing.scale(X)
X = X[:-forecast_out]
X_lately = X[-forecast_out:]


df.dropna(inplace=True)
y = np.array(df['label'])
y = np.array(df['label'])

I got completely lost what he was trying to do here. In int(math.ceil(0.01*len(df))) he was trying to get the number of days he wants to find the prediction of. After that, he did df[forecast_col].shift(-forecast_out) and i couldn't anything after that.


Solution

  • There is not enough information here, but if this is a time series forecasting problem, then what I think is that df[forecast_col].shift(-forecast_out) shifts the forecast column up for 'forecast_out' number of days so that the label column for a specific day would be the number you need to forecast (which is, the number shifted from the future).