Search code examples
pythonpandascalendar

generate a DataFrame of multiple years holidays with workalendar


import pandas as pd
from workalendar.core import Calendar
from workalendar.registry import registry

CalendarClass = registry.get('US')
calendar = CalendarClass()
calendar.holidays(2019)

#> [(datetime.date(2022, 1, 1), 'New year') ...]

You can see the above output that it outputs a list of two elements. How do I convert this into a dataframe of two columns, where one is a date column, and the other is a string column? Among other things I've tried, I can't seem to pd.DataFrame() on the list.


Solution

  • You can simply use the DataFrame constructor:

    df = pd.DataFrame(calendar.holidays(2019), columns=['date', 'name'])
    

    output:

             date                                 name
    0  2019-01-01                             New year
    1  2019-01-21  Birthday of Martin Luther King, Jr.
    2  2019-02-18                Washington's Birthday
    3  2019-05-27                         Memorial Day
    4  2019-07-04                     Independence Day
    5  2019-09-02                            Labor Day
    6  2019-10-14                         Columbus Day
    7  2019-11-11                         Veterans Day
    8  2019-11-28                     Thanksgiving Day
    9  2019-12-25                        Christmas Day
    

    NB. if you want pandas datetime type you can convert the dates with df['date'] = pd.to_datetime(df['date'])

    combine several years:

    option 1

    lst = [calendar.holidays(year) for year in [2019, 2020]]
    df = pd.concat([pd.DataFrame(l, columns=['date', 'name']) for l in lst],
                   ignore_index=True)
    
              date                                 name
    0   2019-01-01                             New year
    1   2019-01-21  Birthday of Martin Luther King, Jr.
    2   2019-02-18                Washington's Birthday
    ...
    19  2020-11-26                     Thanksgiving Day
    20  2020-12-25                        Christmas Day
    

    option 2: MultiIndex

    df = pd.concat({year: pd.DataFrame(calendar.holidays(year),
                                       columns=['date', 'name'])
                    for year in [2019, 2020]})
    
                   date                                 name
    2019 0   2019-01-01                             New year
         1   2019-01-21  Birthday of Martin Luther King, Jr.
         2   2019-02-18                Washington's Birthday
         3   2019-05-27                         Memorial Day
         4   2019-07-04                     Independence Day
         5   2019-09-02                            Labor Day
         6   2019-10-14                         Columbus Day
         7   2019-11-11                         Veterans Day
         8   2019-11-28                     Thanksgiving Day
         9   2019-12-25                        Christmas Day
    2020 0   2020-01-01                             New year
         1   2020-01-20  Birthday of Martin Luther King, Jr.
         2   2020-02-17                Washington's Birthday
         3   2020-05-25                         Memorial Day
         4   2020-07-03          Independence Day (Observed)
         5   2020-07-04                     Independence Day
         6   2020-09-07                            Labor Day
         7   2020-10-12                         Columbus Day
         8   2020-11-11                         Veterans Day
         9   2020-11-26                     Thanksgiving Day
         10  2020-12-25                        Christmas Day