Search code examples
scikit-learn

Sklearn LinearRegression to predict single value of time series


I have a time series dataframe of market prices and I'd like to project the very next value by linear regression, based only on the Close price. I add a timedelta to get the next x-axis value of the predictor variable, but I get TypeError: float() argument must be a string or a real number, not 'Timestamp'

import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.linear_model import LinearRegression

nq = yf.download("NQ=F", period="60d", interval="30m")
nq.index = pd.to_datetime(nq.index)

regr = LinearRegression()
X = np.array(nq.index).reshape(-1, 1)
y = np.array(nq['Close'])
regr.fit(X, y)

lasttime = nq.tail(1).index.item()
newtime = lasttime + datetime.timedelta(minutes=30) 

regr.predict([[newtime]])

Solution

  • It's necessary to convert the Timestamp to a numerical data type that the model can understand

    from datetime import timedelta
    import pandas as pd
    import numpy as np
    import yfinance as yf
    from sklearn.linear_model import LinearRegression
    
    nq = yf.download("NQ=F", period="60d", interval="30m")
    nq.index = pd.to_datetime(nq.index)
    
    regr = LinearRegression()
    X = np.array(nq.index).astype(np.int64)
    X = X.reshape(-1, 1)
    y = np.array(nq['Close'])
    regr.fit(X, y)
    
    lasttime = nq.tail(1).index.item()
    newtime = lasttime + timedelta(minutes=30)
    
    newtime_int = newtime.to_datetime64().astype(np.int64)
    
    predicted_value = regr.predict([[newtime_int]])