Search code examples
pythontime-seriesfacebook-prophetoverfitting-underfitting

Python - Facebook Prophet - Model Underfitting


I am running a prophet model to predict inbound call volumes. I've spent a lot of time cleaning the data, running log scales, and hyperparameter tuning - which yielded on "okay" MAPE (Mean Average Percentage Error).

My problem at this point, is that the model is consistently underfitting. Especially in the first 12 days of the month - and even more so in the first 6 days of the month. Call volumes are always substantially higher on these days for operational reasons. They also start to build near the end of the month as volume ramps into the start of the following month.

Actuals are the blue dots, forecast is the grey line. This is just one month, but its representative of the monthly seasonality in all other months:

 The

For sake of simplicity, I'm just going to include the model details and leave all the data cleansing processes out of the equation. I can provide more information if it would help, but the feedback I've gotten thus far is that the additional detail just muddies the waters. Really the only thing that matters, the results below after running a boxcox transformation on the data that built the model, and a reverse boxcox on the data that came out of the model:

# Create Model
M = Prophet(
    changepoint_prior_scale = 15,
    changepoint_range = .8,
    growth='linear',
    seasonality_mode= 'multiplicative',
    daily_seasonality=False,
    weekly_seasonality=False,
    yearly_seasonality=False,
    holidays=Holidays
    ).add_seasonality(
        name='monthly',
        period=30.5,
        fourier_order = 20,
        prior_scale = 45
    ).add_seasonality(
        name='daily',
        period=1,
        fourier_order=75,
        prior_scale=20
    ).add_seasonality(
        name='weekly',
        period=7,
        fourier_order=75,
        prior_scale=30
    ).add_seasonality(
        name='yearly',
        period = 365.25,
        fourier_order = 30, 
        prior_scale = 15)

In general, I would like to improve the underfitting situation across the board - but especially at the beginning and end of the month. I've tried increasing the changepoint_range to loosen the model up, but the results weren't noticeable. I've also tried increase the prior_scale of the "Monthly" seasonality, but nothing yielded results that were better than the screenshot above.

I'm at a little bit of a loss. Is there a modeling technique that I could use with the FaceBook Prophet model to address this? Is there a way to add a regressor that assigns specific seasonality to the first 12 days and last 7? I did some research, not sure if you can and/or how that would work.

Any help would be hugely appreciated.

Just as an update, I've tried jacking up the change_point range and the change point prior scale, had no impact. Going to try reducing the amount of training data (currently using 4 years).


Solution

  • I think I found a workable solution, documenting it as an answer in case anyone else has a similar problem down the road.

    Since I knew this behavior was cyclical and I know why it exists (2 different monthly billing cycles in the beginning of the month and a recurring increase in volume at the end of the month that was being underfit), I used the Prophet documentation to create additional seasonal regressors for those specific periods.

    I started by defining the functions for the seasons (per the Prophet documentation, example was for NFL on-season and NFL off-season):

    def is_1st_billing_season(ds):
        date = pd.to_datetime(ds)
        return (date.day >= 1 and date.day <= 6)
    
    
    def is_2nd_billing_season(ds):
        date = pd.to_datetime(ds)
        return (date.day >= 7 and date.day <= 12)
    
    
    def EOM (ds):
        date = pd.to_datetime(ds)
        return (date.day >= 25 and date.day <= 31)
    

    Then I applied the functions to my dataframe:

    #Create Additional Seasonal Categories
    Box_Cox_Data['1st_season'] = Call_Data['ds'].apply(is_1st_billing_season)
    Box_Cox_Data['2nd_season'] = Call_Data['ds'].apply(is_2nd_billing_season)
    Box_Cox_Data['EOM'] = Call_Data['ds'].apply(EOM)
    

    Then I updated my model to include the additional seasonal regressors:

    # Create Model
    M = Prophet(
        changepoint_prior_scale = 15,
        changepoint_range = .8,
        growth='linear',
        seasonality_mode= 'multiplicative',
        daily_seasonality=False,
        weekly_seasonality=False,
        yearly_seasonality=False,
        holidays=Holidays
        ).add_seasonality(
            name='monthly',
            period=30.5,
            fourier_order = 20,
            prior_scale = 45
        ).add_seasonality(
            name='daily_1st_season',
            period=1,
             fourier_order=75,
            prior_scale=20,
            condition_name='1st_season'
        ).add_seasonality(
            name='daily_2nd_season',
            period=1,
            fourier_order=75,
            prior_scale=20,
            condition_name='2nd_season'
        ).add_seasonality(
            name='daily_EOM_season',
            period=1,
            fourier_order=75,
            prior_scale=20,
            condition_name='EOM'
        ).add_seasonality(
            name='weekly',
            period=7,
            fourier_order=75,
            prior_scale=30
        ).add_seasonality(
            name='yearly',
            period = 365.25,
            fourier_order = 30, #CHECK THIS
            prior_scale = 15)
            
    #Fit Model
    M.fit(Box_Cox_Data)
    
    # Create Future Dataframe (in Hours)
    future = M.make_future_dataframe(freq='H', periods = Hours_Needed)
    future['1st_season'] = future['ds'].apply(is_1st_billing_season)
    future['2nd_season'] = future['ds'].apply(is_2nd_billing_season)
    future['EOM'] = future['ds'].apply(EOM)
    
    # Predict Future Values
    forecast = M.predict(future)
    

    The end result looks much better:

    enter image description here

    For the sake of full transparency, this screenshot is for a slightly different period than the original screenshot. For this project, my starting point isn't ultra important (predictions for future periods are the primary focus) and I accidentally ran the cross-validation for a different timeframe, but the end result is a better fitting seasonal forecast across all time frames I have seen thus far.