I have one common data set for my prediction that includes data across the globe.
ds y country_id
01/01/2021 09:00:00 5.0 1
01/01/2021 09:10:00 5.2 1
01/01/2021 09:20:00 5.4 1
01/01/2021 09:30:00 6.1 1
01/01/2021 09:00:00 2.0 2
01/01/2021 09:10:00 2.2 2
01/01/2021 09:20:00 2.4 2
01/01/2021 09:30:00 3.1 2
playoffs = pd.DataFrame({
'holiday': 'playoff',
'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
'2010-01-24', '2010-02-07', '2011-01-08',
'2013-01-12', '2014-01-12', '2014-01-19',
'2014-02-02', '2015-01-11', '2016-01-17',
'2016-01-24', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
superbowls = pd.DataFrame({
'holiday': 'superbowl',
'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
holidays = pd.concat((playoffs, superbowls))
Now, I would like to add holidays to the model.
m = NeuralProphet(holidays=holidays)
m.add_country_holidays(country_name='US')
m.fit(df)
Here is a possible solution:
The program:
# NOTE 1: tested on google colab
# Un-comment the following (!pip) line if you need to install the libraries
# on google colab notebook:
#!pip install neuralprophet pandas numpy holidays
import pandas as pd
import numpy as np
import holidays
from neuralprophet import NeuralProphet
import datetime
# NOTE 2: Most of the code comes from:
# https://neuralprophet.com/html/events_holidays_peyton_manning.html
# Context:
# We will use the time series of the log daily page views for the Wikipedia
# page for Peyton Manning (American former football quarterback ) as an example.
# During playoffs and super bowls, the Peyton Manning's wiki page is more frequently
# viewed. We would like to see if countries specific holidays also have an
# influence.
# First, we load the data:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "wp_log_peyton_manning.csv")
# To simulate your case, we add a country_id column filled with random values {1,2}
# Let's assume US=1 and Canada=2
import numpy as np
np.random.seed(0)
df['country_id']=np.random.randint(1,2+1,df['ds'].count())
print("The dataframe we are working on:")
print(df.head())
# We would like to add holidays for US and Canada to see if holidays have an
# influence on the # of daily's views on Manning's wiki page.
# The data in df starts in 2007 and ends in 2016:
StartingYear=2007
LastYear=2016
# Holidays for US and Canada:
US_holidays = holidays.US(years=[year for year in range(StartingYear, LastYear+1)])
CA_holidays = holidays.CA(years=[year for year in range(StartingYear, LastYear+1)])
holidays_US=pd.DataFrame()
holidays_US['ds']=[]
holidays_US['event']=[]
holidays_CA=pd.DataFrame()
holidays_CA['ds']=[]
holidays_CA['event']=[]
for i in df.index:
# Convert date string to datetime object:
datetimeobj=[int(x) for x in df['ds'][i].split('-')]
# Check if the corresponding day is a holyday in the US;
if df['country_id'][i]==1 and (datetime.datetime(*datetimeobj) in US_holidays):
d = {'ds': [df['ds'][i]], 'event': ['holiday_US']}
df1=pd.DataFrame(data=d)
# If yes: add to holidays_US
holidays_US=holidays_US.append(df1,ignore_index=True)
# Check if the corresponding day is a holyday in Canada:
if df['country_id'][i]==2 and (datetime.datetime(*datetimeobj) in CA_holidays):
d = {'ds': [df['ds'][i]], 'event': ['holiday_CA']}
df1=pd.DataFrame(data=d)
# If yes: add to holidays_CA
holidays_CA=holidays_CA.append(df1,ignore_index=True)
# Now we can drop the country_id in df:
df.drop('country_id', axis=1, inplace=True)
print("Days in df that are holidays in the US:")
print(holidays_US.head())
print()
print("Days in df that are holidays in Canada:")
print(holidays_CA.head())
# user specified events
# history events
playoffs = pd.DataFrame({
'event': 'playoff',
'ds': pd.to_datetime([
'2008-01-13', '2009-01-03', '2010-01-16',
'2010-01-24', '2010-02-07', '2011-01-08',
'2013-01-12', '2014-01-12', '2014-01-19',
'2014-02-02', '2015-01-11', '2016-01-17',
'2016-01-24', '2016-02-07',
]),
})
superbowls = pd.DataFrame({
'event': 'superbowl',
'ds': pd.to_datetime([
'2010-02-07', '2012-02-05', '2014-02-02',
'2016-02-07',
]),
})
# Create the events_df:
events_df = pd.concat((playoffs, superbowls, holidays_US, holidays_CA))
# Create neural network and fit:
# NeuralProphet Object
m = NeuralProphet(loss_func="MSE")
m = m.add_events("playoff")
m = m.add_events("superbowl")
m = m.add_events("holiday_US")
m = m.add_events("holiday_CA")
# create the data df with events
history_df = m.create_df_with_events(df, events_df)
# fit the model
metrics = m.fit(history_df, freq="D")
# forecast with events known ahead
future = m.make_future_dataframe(df=history_df, events_df=events_df, periods=365, n_historic_predictions=len(df))
forecast = m.predict(df=future)
fig = m.plot(forecast)
fig_param = m.plot_parameters()
fig_comp = m.plot_components(forecast)
RESULT: The results (see PARAMETERS figure) seem to show that when a day is a holiday, there are less views in both US and Canada. Does it make sense? Maybe... It looks plausible that people on holiday have more interesting things to do than browsing Manning's wiki page :-) I don't know.
PROGRAM'S OUTPUT:
The dataframe we are working on:
ds y country_id
0 2007-12-10 9.5908 1
1 2007-12-11 8.5196 2
2 2007-12-12 8.1837 2
3 2007-12-13 8.0725 1
4 2007-12-14 7.8936 2
Days in df that are holidays in the US:
ds event
0 2007-12-25 holiday_US
1 2008-01-21 holiday_US
2 2008-07-04 holiday_US
3 2008-11-27 holiday_US
4 2008-12-25 holiday_US
Days in df that are holidays in Canada:
ds event
0 2008-01-01 holiday_CA
1 2008-02-18 holiday_CA
2 2008-08-04 holiday_CA
3 2008-09-01 holiday_CA
4 2008-10-13 holiday_CA
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set epochs to 138
88%
241/273 [00:02<00:00, 121.69it/s]
INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.51E+00
88%
241/273 [00:02<00:00, 123.87it/s]
INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.63E+00
89%
242/273 [00:02<00:00, 121.58it/s]
INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.62E-02, min: 2.58E+00
INFO - (NP.forecaster._init_train_loader) - lr-range-test selected learning rate: 3.44E-02
Epoch[138/138]: 100%|██████████| 138/138 [00:29<00:00, 4.74it/s, MSELoss=0.012, MAE=0.344, RMSE=0.478, RegLoss=0]
The figures:
FORECASTS:
PARAMETERS:
COMPONENTS: