Search code examples
facebook-prophetprophet

Fecebook NeauralProphet - adding holidays


I have one common data set for my prediction that includes data across the globe.

    ds                 y     country_id
01/01/2021 09:00:00   5.0       1
01/01/2021 09:10:00   5.2       1
01/01/2021 09:20:00   5.4       1
01/01/2021 09:30:00   6.1       1
01/01/2021 09:00:00   2.0       2
01/01/2021 09:10:00   2.2       2
01/01/2021 09:20:00   2.4       2
01/01/2021 09:30:00   3.1       2



    playoffs = pd.DataFrame({
      'holiday': 'playoff',
      'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
                            '2010-01-24', '2010-02-07', '2011-01-08',
                            '2013-01-12', '2014-01-12', '2014-01-19',
                            '2014-02-02', '2015-01-11', '2016-01-17',
                            '2016-01-24', '2016-02-07']),
      'lower_window': 0,
      'upper_window': 1,
    })
    superbowls = pd.DataFrame({
      'holiday': 'superbowl',
      'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
      'lower_window': 0,
      'upper_window': 1,

})
holidays = pd.concat((playoffs, superbowls))

Now, I would like to add holidays to the model.

m = NeuralProphet(holidays=holidays)
m.add_country_holidays(country_name='US')
m.fit(df)
  1. How can I add multiple country holidays to add_country_holidays (m.add_country_holidays)?
  2. How to add country specific holidays to holidays data?
  3. Do I need to generate different model specific to country? Or, one model for the entire dataset is fine and then will be able to add the regressor. What is the recommendation?

Solution

  • Here is a possible solution:

    The program:

    # NOTE 1: tested on google colab
    
    # Un-comment the following (!pip) line if you need to install the libraries 
    # on google colab notebook:
    
    #!pip install neuralprophet pandas numpy holidays
    
    import pandas as pd
    import numpy as np
    import holidays
    from neuralprophet import NeuralProphet
    import datetime
    
    
    # NOTE 2: Most of the code comes from:
    # https://neuralprophet.com/html/events_holidays_peyton_manning.html
    
    # Context:
    # We will use the time series of the log daily page views for the Wikipedia
    # page for Peyton Manning (American former football quarterback ) as an example.
    # During playoffs and super bowls, the Peyton Manning's wiki page is more frequently
    # viewed. We would like to see if countries specific holidays also have an
    # influence. 
    
    # First, we load the data:
    
    data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
    df = pd.read_csv(data_location + "wp_log_peyton_manning.csv")
    
    # To simulate your case, we add a country_id column filled with random values {1,2}
    # Let's assume US=1 and Canada=2
    
    import numpy as np
    np.random.seed(0)
    df['country_id']=np.random.randint(1,2+1,df['ds'].count())
    
    print("The dataframe we are working on:")
    print(df.head())
    
    
    # We would like to add holidays for US and Canada to see if holidays have an
    # influence on the # of daily's views on Manning's wiki page.
    
    # The data in df starts in 2007 and ends in 2016:
    StartingYear=2007
    LastYear=2016
    #  Holidays for US and Canada:
    US_holidays = holidays.US(years=[year for year in range(StartingYear, LastYear+1)])
    CA_holidays = holidays.CA(years=[year for year in range(StartingYear, LastYear+1)])
    
    holidays_US=pd.DataFrame()
    holidays_US['ds']=[]
    holidays_US['event']=[]
    holidays_CA=pd.DataFrame()
    holidays_CA['ds']=[]
    holidays_CA['event']=[]
    for i in df.index: 
        # Convert date string to datetime object:
        datetimeobj=[int(x) for x in df['ds'][i].split('-')] 
        # Check if the corresponding day is a holyday in the US;
        if  df['country_id'][i]==1 and (datetime.datetime(*datetimeobj) in US_holidays):
            d = {'ds': [df['ds'][i]], 'event': ['holiday_US']}
            df1=pd.DataFrame(data=d)
            # If yes: add to holidays_US
            holidays_US=holidays_US.append(df1,ignore_index=True)
            
        # Check if the corresponding day is a holyday in Canada:
        if  df['country_id'][i]==2 and (datetime.datetime(*datetimeobj) in CA_holidays):
            d = {'ds': [df['ds'][i]], 'event': ['holiday_CA']}
            df1=pd.DataFrame(data=d)
            # If yes: add to holidays_CA
            holidays_CA=holidays_CA.append(df1,ignore_index=True)
    
    # Now we can drop the country_id in df:
    df.drop('country_id', axis=1, inplace=True)
    
    
    print("Days in df that are holidays in the US:")
    print(holidays_US.head())
    print()
    print("Days in df that are holidays in Canada:")
    print(holidays_CA.head())
    
    
    # user specified events
    # history events
    playoffs = pd.DataFrame({
        'event': 'playoff',
        'ds': pd.to_datetime([
            '2008-01-13', '2009-01-03', '2010-01-16',
            '2010-01-24', '2010-02-07', '2011-01-08',
            '2013-01-12', '2014-01-12', '2014-01-19',
            '2014-02-02', '2015-01-11', '2016-01-17',
            '2016-01-24', '2016-02-07',
        ]),
    })
    
    superbowls = pd.DataFrame({
        'event': 'superbowl',
        'ds': pd.to_datetime([
            '2010-02-07', '2012-02-05', '2014-02-02', 
            '2016-02-07',
        ]),
    })
    
    
    # Create the events_df:
    events_df = pd.concat((playoffs, superbowls, holidays_US, holidays_CA))
    
    # Create neural network and fit:
    # NeuralProphet Object
    m = NeuralProphet(loss_func="MSE")
    m = m.add_events("playoff")
    m = m.add_events("superbowl")
    m = m.add_events("holiday_US")
    m = m.add_events("holiday_CA")
    
    
    # create the data df with events
    history_df = m.create_df_with_events(df, events_df)
    
    # fit the model
    metrics = m.fit(history_df, freq="D")
    
    # forecast with events known ahead
    future = m.make_future_dataframe(df=history_df, events_df=events_df, periods=365, n_historic_predictions=len(df))
    forecast = m.predict(df=future)
    
    
    fig = m.plot(forecast)
    fig_param = m.plot_parameters()
    fig_comp = m.plot_components(forecast)
    

    RESULT: The results (see PARAMETERS figure) seem to show that when a day is a holiday, there are less views in both US and Canada. Does it make sense? Maybe... It looks plausible that people on holiday have more interesting things to do than browsing Manning's wiki page :-) I don't know.

    PROGRAM'S OUTPUT:

    The dataframe we are working on:
               ds       y  country_id
    0  2007-12-10  9.5908           1
    1  2007-12-11  8.5196           2
    2  2007-12-12  8.1837           2
    3  2007-12-13  8.0725           1
    4  2007-12-14  7.8936           2
    Days in df that are holidays in the US:
               ds       event
    0  2007-12-25  holiday_US
    1  2008-01-21  holiday_US
    2  2008-07-04  holiday_US
    3  2008-11-27  holiday_US
    4  2008-12-25  holiday_US
    
    Days in df that are holidays in Canada:
               ds       event
    0  2008-01-01  holiday_CA
    1  2008-02-18  holiday_CA
    2  2008-08-04  holiday_CA
    3  2008-09-01  holiday_CA
    4  2008-10-13  holiday_CA
    
    INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
    INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
    INFO - (NP.config.set_auto_batch_epoch) - Auto-set epochs to 138
    
    88%
    241/273 [00:02<00:00, 121.69it/s]
    
    INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.51E+00
    
    88%
    241/273 [00:02<00:00, 123.87it/s]
    
    INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.63E+00
    
    89%
    242/273 [00:02<00:00, 121.58it/s]
    
    INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.62E-02, min: 2.58E+00
    INFO - (NP.forecaster._init_train_loader) - lr-range-test selected learning rate: 3.44E-02
    Epoch[138/138]: 100%|██████████| 138/138 [00:29<00:00,  4.74it/s, MSELoss=0.012, MAE=0.344, RMSE=0.478, RegLoss=0]
    

    The figures:

    FORECASTS:

    enter image description here

    PARAMETERS:

    enter image description here

    COMPONENTS:

    enter image description here