I'm using Matplotlib and Numpy to plot linear regressions on time series plots in order to predict the trends in the future.
Generating the regressions doesn't seem to be particularly difficult, but getting the regression line to extend past the last data point is proving challenging:
How can I extend the regressions?
When you evaluate your regression model, you're predicting a value of submissions for the input date. To predict a wider range, you need to increase the range of dates that you're evaluating the model on. I'd also use np.polyval
instead of the list comprehension, just because as it's more compact:
# Generate data like the question
observed_dates = pd.date_range("jan 2004", "april 2013", freq="M")
submissions = np.random.normal(5000, 100, len(observed_dates))
submissions += np.arange(len(observed_dates)) * 10
submissions[::12] += 800
# Plot the observed data
plt.plot(observed_dates, submissions, marker="o")
# Fit a model and predict future dates
predict_dates = pd.date_range("jan 2004", "jan 2020", freq="M")
model = np.polyfit(observed_dates.asi8, submissions, 1)
predicted = np.polyval(model, predict_dates.asi8)
# Plot the model
plt.plot(predict_dates, predicted, lw=3)