I'm trying an experiment with fbprophet adding an extra regressor, and it doesn't seem to be improving accuracy. Which of course it's not guaranteed to in the general case, but I've boiled it down to a synthetic case where it looks like it should, and it still isn't, so I wonder if I'm doing something wrong.
This is the input data:
ds,y
2011-01-01,8
2011-02-01,10
2011-03-01,10
2011-04-01,10
2011-05-01,9
2011-06-01,8
2011-07-01,6
2011-08-01,7
2011-09-01,9
2011-10-01,9
2011-11-01,10
2011-12-01,10
2012-01-01,20
2012-02-01,20
2012-03-01,20
2012-04-01,20
2012-05-01,20
2012-06-01,20
2012-07-01,20
2012-08-01,20
2012-09-01,20
2012-10-01,20
2012-11-01,20
2012-12-01,20
The idea being that it's a somewhat noisy first year, that is no real guide to the second year, where all values settle on 20.
And this is my code:
import fbprophet
import pandas as pd
import sklearn.metrics
plain = pd.read_csv("data.csv")
plain_train = plain[plain.ds < "2012-01-01"]
plain_test = plain[plain.ds >= "2012-01-01"]
plain_m = fbprophet.Prophet()
plain_m.fit(plain_train)
plain_forecast = plain_m.predict(plain_test)
augmented = pd.read_csv("data.csv")
augmented["extra"] = [20.0] * 24
augmented_train = augmented[augmented.ds < "2012-01-01"]
augmented_test = augmented[augmented.ds >= "2012-01-01"]
augmented_m = fbprophet.Prophet()
augmented_m.add_regressor("extra")
augmented_m.fit(augmented_train)
augmented_forecast = augmented_m.predict(augmented_test)
print("plain_forecast")
print(
sklearn.metrics.mean_absolute_error(
plain_test["y"].values, plain_forecast["yhat"].values
)
)
print("augmented_forecast")
print(
sklearn.metrics.mean_absolute_error(
augmented_test["y"].values, augmented_forecast["yhat"].values
)
)
Which tries a forecast run first on the unaugmented data, then augmented with a second column that's always 20, therefore a perfect guide to what the data will settle on, so presumably very helpful.
And the output:
plain_forecast
11.072845254369435
augmented_forecast
11.072872701031054
The numbers are not identical, so the extra column is not being completely ignored, but the accuracy is not improved; though the difference is down in the noise level, it's very slightly worse.
What am I missing?
Nothing wrong in the code. But couple of things to take in mind and then you will see fb prophet difference
This data is a problem.
In your model, if you want to check seasonability + growth, you should split year and month to two separate variables. That will facilitate any algo understanding that with every year there is growth and every month has a seasonability curve.
You train your data on the period where variable data is volatile and test in period where it's flat.
Please give it a try with another set of data. Any actually will do.
24 items is extremely small sample and propeth is for much bigger data and possibly more variables.
Despite the name prophet is not magic, it uses various correlation techniques and in small sets of records, it will be not different from simple regression.
Let us know how it went 😊