Search code examples
pythondatepandaslinear-regression

python linear regression predict by date


I want to predict a value at a date in the future with simple linear regression, but I can't due to the date format.

This is the dataframe I have:

data_df = 
date          value
2016-01-15    1555
2016-01-16    1678
2016-01-17    1789
...  

y = np.asarray(data_df['value'])
X = data_df[['date']]
X_train, X_test, y_train, y_test = train_test_split             
(X,y,train_size=.7,random_state=42)

model = LinearRegression() #create linear regression object
model.fit(X_train, y_train) #train model on train data
model.score(X_train, y_train) #check score

print (‘Coefficient: \n’, model.coef_)
print (‘Intercept: \n’, model.intercept_) 
coefs = zip(model.coef_, X.columns)
model.__dict__
print "sl = %.1f + " % model.intercept_ + \
     " + ".join("%.1f %s" % coef for coef in coefs) #linear model

I tried to convert the date unsuccessfully

data_df['conv_date'] = data_df.date.apply(lambda x: x.toordinal())

data_df['conv_date'] = pd.to_datetime(data_df.date, format="%Y-%M-%D")

Solution

  • Linear regression doesn't work on date data. Therefore we need to convert it into numerical value.The following code will convert the date into numerical value:

    import datetime as dt
    data_df['Date'] = pd.to_datetime(data_df['Date'])
    data_df['Date']=data_df['Date'].map(dt.datetime.toordinal)