Search code examples
pythonpandasdataframedatetimefacebook-prophet

Handle equivalent holidays for different countries in Prophet


I am creating datasets with chronological variables, more specifically, holidays, and after doing some tests with pandas I am now using FBProphet.

As I am considering a region (Iberian Peninsula), it will involve holidays for two countries: Portugal and Spain. This is the current Prophet behavior:

>>> m.add_country_holidays('PT')
>>> m.add_country_holidays('ES')
WARNING:fbprophet:Changing country holidays from PT to ES

As my goal is to simply know if it is holiday or not:

• Even thought the same holiday may end up in different dates for each of the countries, for the work I am doing, knowing the distinction between the holidays is not that relevant.

• If celebrations fall in the same day, such as "Ano Novo" and "Año Nuevo", all I care about is that in that specific day, it is holiday.

Even though it would be helpful to know in which of the countries (and regions) we have the holidays, as the populations are different and it is a factor that affects my forecasts, knowing only that it is an holiday is already a good improvement for my forecasts.

This is how I get the holidays for each of the countries:

from fbprophet.make_holidays import make_holidays_df

year_list = [2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
PTBusinessCalendar = make_holidays_df(year_list=year_list, country='PT')
ESBusinessCalendar = make_holidays_df(year_list=year_list, country='ES')

In order to get the Iberian Holidays calender, I can concat them, like this:

iberian = [PTBusinessCalendar, ESBusinessCalendar]
iberian_2 = pd.concat(iberian).sort_values('ds').reset_index(drop=True)

Which results in the following output:

Output of the Iberian Calendar

As one can see from the DataFrame, for example, the Indexes 0 and 1 both have the same date (ds) and represent the same holiday.

What should I do in order to join the rows that have the same ds and write the holiday name in the line of the first one, separated by a comma?


Solution

  • The following solved my quest:

    def join(h):
        return ', '.join(h.holiday)
    IberianBusinessCalendar = iberian_2.groupby("ds").apply(join).to_frame(name="holiday")
    

    And outputs the following:

    enter image description here