Search code examples
pythonfor-looptime-seriesfacebook-prophet

Python Prophet Demand Forecasting for multiple products: saving all forecasts into single data frame


I have the following code that creates a time series forecast for 3 products (A,B and C). It saves the forecasts for all the products into a data frame, forecast_df. However, I can't figure out how to get it to put the product name in the rows with the product's forecast. It creates a column for the product name, but then it puts C for all the products except C, which it calls NaN. How can I get it to put the product name correctly?

import pandas as pd
import numpy as np
from datetime import date, timedelta, datetime
from fbprophet import Prophet
from pmdarima.model_selection import train_test_split

class color:
    BOLD = '\033[1m'
    END = '\033[0m'

df = pd.DataFrame({"ds": ['2020-4-26','2020-5-3','2020-5-10','2020-5-17','2020-5-24','2020-5- 
                  31','2020-6-7','2020-6-14','2020-6-21','2020-6-28'], 
               "A": [164,127,157,127,170,322,133,176,233,257], "B": 
                    [306,405,267,265,306,265,325,297,310,271], 
               "C": [23,41,75,24,48,31,51,26,41,43]})

df['ds'] = pd.to_datetime(df['ds'])

start_date = min(df['ds'])
end_date = max(df['ds'])
print(start_date, end_date)

train_len = int(week * 0.99)
print(train_len)

forecast_df = pd.DataFrame()

for col in df.columns[1:]:
    print('\n', color.BOLD + 'ITEM #', col + color.END)
    dfx = df[['ds', col]]
    dfx = dfx.rename({col: 'y'}, axis=1)

    #Train-Test-Split
    y_train, y_test = train_test_split(dfx, train_size=train_len)

    #Fit Model
    m = Prophet()
    m.fit(y_train)

    #Calculate forecast
    print('\n', color.BOLD + 'CALCULATE FORECAST' + color.END)
    future = m.make_future_dataframe(periods=5, freq='W')
    forecast = m.predict(future)
    print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())

    #save forecast to dataframe
    for col in df.columns[1:]:
        forecast_df['Product'] = col
        #forecast_df = pd.concat((forecast_df, forecast[['ds', 'yhat','yhat_upper','yhat_lower']]))
        forecast_df = forecast_df.append(forecast[['ds', 'yhat','yhat_upper','yhat_lower']])

print('\n', color.BOLD + 'FORECAST DATAFRAME' + color.END)
print(forecast_df)

Here's the output:

 ITEM # A

 CALCULATE FORECAST
 
           ds        yhat  yhat_lower  yhat_upper

8  2020-06-21  207.886980  137.260222  277.544817

9  2020-06-28  215.891658  144.606357  287.614507

10 2020-07-05  223.896335  152.586583  293.597136

11 2020-07-12  231.901013  154.032672  304.878146

12 2020-07-19  239.905690  168.971432  311.714361




 ITEM # B

 CALCULATE FORECAST

           ds        yhat  yhat_lower  yhat_upper

8  2020-06-21  281.352232  228.094939  330.983604

9  2020-06-28  276.200199  221.798843  329.703188

10 2020-07-05  271.048166  216.131704  322.966893

11 2020-07-12  265.896133  214.037533  317.526028

12 2020-07-19  260.744100  209.923955  312.366033



 ITEM # C

 CALCULATE FORECAST

           ds       yhat  yhat_lower  yhat_upper

8  2020-06-21  38.015576   16.388462   58.753207

9  2020-06-28  37.578068   16.483414   60.281396

10 2020-07-05  37.140560   16.269026   58.592980

11 2020-07-12  36.703052   16.351184   58.890988

12 2020-07-19  36.265544   14.038544   56.053327






 FORECAST DATAFRAME

   Product         ds        yhat  yhat_upper  yhat_lower Product

0        C 2020-04-26  143.849560  215.773593   71.588250

1        C 2020-05-03  151.854238  223.179640   77.229544

2        C 2020-05-10  159.858915  231.388203   87.914720

3        C 2020-05-17  167.863593  243.705433   97.468648

4        C 2020-05-24  175.868270  247.448227  103.620476

..     ...        ...         ...         ...         ...

8      NaN 2020-06-21   38.015576   57.919222   17.838312

9      NaN 2020-06-28   37.578068   57.668158   15.740910

10     NaN 2020-07-05   37.140560   57.990138   16.508734

11     NaN 2020-07-12   36.703052   59.519944   17.110290

12     NaN 2020-07-19   36.265544   56.410552   15.599971


[117 rows x 5 columns]

Solution

  • Thanks, Catalina!

    I had to modify it a little, but here's the answer that worked for me:

    #save forecast to dataframe

    forecast['Item'] = col

    #concat all results in one dataframe

    forecast_df = pd.concat([forecast_df, forecast[['Item','ds', 'Quantity_Ordered', 'yhat', 'yhat_lower', 'yhat_upper']]], ignore_index=True)