I have to create a categorical variable out of pandas date-time index and looking for a pythonic way for it.
Till now i just looped through all index and did a bunch of if-else. I tried using, taking inspiration from (Adding a new pandas column with mapped value from a dictionary), a dictionary of lambda if else function and use map for creating a categorical function, but it didn't work
date_series = pd.date_range(start = '2010-12-31', end = '2018-12-31', freq = 'M')
regime_splitter = {lambda x : x < '2012' : 'before 2012' , lambda x : x>= '2012' and x < '2014': '2012 - 2014', lambda x : x>= '2014' : 'after 2014'}
date_series.map(regime_splitter)
expected result
date regime
0 2010-12-31 before 2012
1 2013-05-31 between 2012, 2014
2 2018-12-31 after 2014
Use cut
with DatetimeIndex.year
for solution if need add/remove more groups:
a = pd.cut(date_series.year,
bins=[-np.inf, 2012, 2014, np.inf],
labels=['before 2012','2012 - 2014','after 2014'])
print (a.value_counts())
before 2012 25
2012 - 2014 24
after 2014 48
dtype: int64
Another solution with numpy.select
:
x = date_series.year
a = np.select([x <= 2012, x>= 2014], ['before 2012','after 2014'], '2012 - 2014')
print (pd.Series(a).value_counts())
after 2014 60
before 2012 25
2012 - 2014 12
dtype: int64
Your solution should be changed with nested if-else
, but if large data it should be slow:
regime_splitter = (lambda x: 'before 2012' if x <= 2012 else
('2012 - 2014' if x>= 2012 and x <= 2014 else 'after 2014'))
a = date_series.year.map(regime_splitter)
print (a.value_counts())
after 2014 48
before 2012 25
2012 - 2014 24
dtype: int64