Search code examples
pythonpandaspython-datetime

Split pandas datetime index to create categorical variable


I have to create a categorical variable out of pandas date-time index and looking for a pythonic way for it.

Till now i just looped through all index and did a bunch of if-else. I tried using, taking inspiration from (Adding a new pandas column with mapped value from a dictionary), a dictionary of lambda if else function and use map for creating a categorical function, but it didn't work

date_series = pd.date_range(start = '2010-12-31', end = '2018-12-31', freq = 'M')

regime_splitter = {lambda x : x < '2012' : 'before 2012' , lambda x : x>= '2012' and x < '2014': '2012 - 2014', lambda x : x>= '2014' : 'after 2014'}

date_series.map(regime_splitter)

expected result

         date              regime
0  2010-12-31         before 2012
1  2013-05-31  between 2012, 2014
2  2018-12-31          after 2014

Solution

  • Use cut with DatetimeIndex.year for solution if need add/remove more groups:

    a = pd.cut(date_series.year, 
           bins=[-np.inf, 2012, 2014, np.inf], 
           labels=['before 2012','2012 - 2014','after 2014'])
    print (a.value_counts())
    before 2012    25
    2012 - 2014    24
    after 2014     48
    dtype: int64
    

    Another solution with numpy.select:

    x = date_series.year
    a = np.select([x <= 2012, x>= 2014], ['before 2012','after 2014'], '2012 - 2014')
    
    print (pd.Series(a).value_counts())
    after 2014     60
    before 2012    25
    2012 - 2014    12
    dtype: int64
    

    Your solution should be changed with nested if-else, but if large data it should be slow:

    regime_splitter = (lambda x: 'before 2012' if x <= 2012 else 
                                 ('2012 - 2014' if x>= 2012 and x <= 2014 else 'after 2014'))
    
    a = date_series.year.map(regime_splitter)
    print (a.value_counts())
    after 2014     48
    before 2012    25
    2012 - 2014    24
    dtype: int64