I have the following code that creates a time series forecast for 3 products (A,B and C). It saves the forecasts for all the products into a data frame, forecast_df. However, I can't figure out how to get it to put the product name in the rows with the product's forecast. It creates a column for the product name, but then it puts C for all the products except C, which it calls NaN. How can I get it to put the product name correctly?
import pandas as pd
import numpy as np
from datetime import date, timedelta, datetime
from fbprophet import Prophet
from pmdarima.model_selection import train_test_split
class color:
BOLD = '\033[1m'
END = '\033[0m'
df = pd.DataFrame({"ds": ['2020-4-26','2020-5-3','2020-5-10','2020-5-17','2020-5-24','2020-5-
31','2020-6-7','2020-6-14','2020-6-21','2020-6-28'],
"A": [164,127,157,127,170,322,133,176,233,257], "B":
[306,405,267,265,306,265,325,297,310,271],
"C": [23,41,75,24,48,31,51,26,41,43]})
df['ds'] = pd.to_datetime(df['ds'])
start_date = min(df['ds'])
end_date = max(df['ds'])
print(start_date, end_date)
train_len = int(week * 0.99)
print(train_len)
forecast_df = pd.DataFrame()
for col in df.columns[1:]:
print('\n', color.BOLD + 'ITEM #', col + color.END)
dfx = df[['ds', col]]
dfx = dfx.rename({col: 'y'}, axis=1)
#Train-Test-Split
y_train, y_test = train_test_split(dfx, train_size=train_len)
#Fit Model
m = Prophet()
m.fit(y_train)
#Calculate forecast
print('\n', color.BOLD + 'CALCULATE FORECAST' + color.END)
future = m.make_future_dataframe(periods=5, freq='W')
forecast = m.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
#save forecast to dataframe
for col in df.columns[1:]:
forecast_df['Product'] = col
#forecast_df = pd.concat((forecast_df, forecast[['ds', 'yhat','yhat_upper','yhat_lower']]))
forecast_df = forecast_df.append(forecast[['ds', 'yhat','yhat_upper','yhat_lower']])
print('\n', color.BOLD + 'FORECAST DATAFRAME' + color.END)
print(forecast_df)
Here's the output:
ITEM # A
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 207.886980 137.260222 277.544817
9 2020-06-28 215.891658 144.606357 287.614507
10 2020-07-05 223.896335 152.586583 293.597136
11 2020-07-12 231.901013 154.032672 304.878146
12 2020-07-19 239.905690 168.971432 311.714361
ITEM # B
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 281.352232 228.094939 330.983604
9 2020-06-28 276.200199 221.798843 329.703188
10 2020-07-05 271.048166 216.131704 322.966893
11 2020-07-12 265.896133 214.037533 317.526028
12 2020-07-19 260.744100 209.923955 312.366033
ITEM # C
CALCULATE FORECAST
ds yhat yhat_lower yhat_upper
8 2020-06-21 38.015576 16.388462 58.753207
9 2020-06-28 37.578068 16.483414 60.281396
10 2020-07-05 37.140560 16.269026 58.592980
11 2020-07-12 36.703052 16.351184 58.890988
12 2020-07-19 36.265544 14.038544 56.053327
FORECAST DATAFRAME
Product ds yhat yhat_upper yhat_lower Product
0 C 2020-04-26 143.849560 215.773593 71.588250
1 C 2020-05-03 151.854238 223.179640 77.229544
2 C 2020-05-10 159.858915 231.388203 87.914720
3 C 2020-05-17 167.863593 243.705433 97.468648
4 C 2020-05-24 175.868270 247.448227 103.620476
.. ... ... ... ... ...
8 NaN 2020-06-21 38.015576 57.919222 17.838312
9 NaN 2020-06-28 37.578068 57.668158 15.740910
10 NaN 2020-07-05 37.140560 57.990138 16.508734
11 NaN 2020-07-12 36.703052 59.519944 17.110290
12 NaN 2020-07-19 36.265544 56.410552 15.599971
[117 rows x 5 columns]
Thanks, Catalina!
I had to modify it a little, but here's the answer that worked for me:
#save forecast to dataframe
forecast['Item'] = col
#concat all results in one dataframe
forecast_df = pd.concat([forecast_df, forecast[['Item','ds', 'Quantity_Ordered', 'yhat', 'yhat_lower', 'yhat_upper']]], ignore_index=True)