Search code examples
pythonnumpymatplotlibregression

Extending regressions beyond data


I'm using Matplotlib and Numpy to plot linear regressions on time series plots in order to predict the trends in the future.

Generating the regressions doesn't seem to be particularly difficult, but getting the regression line to extend past the last data point is proving challenging:

time series with linear regressions in iPython Notebook

How can I extend the regressions?


Solution

  • When you evaluate your regression model, you're predicting a value of submissions for the input date. To predict a wider range, you need to increase the range of dates that you're evaluating the model on. I'd also use np.polyval instead of the list comprehension, just because as it's more compact:

    # Generate data like the question
    observed_dates = pd.date_range("jan 2004", "april 2013", freq="M")
    submissions = np.random.normal(5000, 100, len(observed_dates))
    submissions += np.arange(len(observed_dates)) * 10
    submissions[::12] += 800
    
    # Plot the observed data
    plt.plot(observed_dates, submissions, marker="o")
    
    # Fit a model and predict future dates
    predict_dates = pd.date_range("jan 2004", "jan 2020", freq="M")
    model = np.polyfit(observed_dates.asi8, submissions, 1)
    predicted = np.polyval(model, predict_dates.asi8)
    
    # Plot the model
    plt.plot(predict_dates, predicted, lw=3)
    

    enter image description here