Given a pandas series with years as indices (in ascending order with no missing years):
growth = pd.Series({1990: 6.99, 1991: 5.53, 1992: -9.02, 1993: 1.05, 1994: 9.24, 1995: 0.16, 1996: 10.36, 1997: 2.68, 1998: 2.89, 1999: -0.82, 2000: -3.06, 2001: 1.44, 2002: -8.89, 2003: -17.0, 2004: -5.81, 2005: -5.71, 2006: -3.46, 2007: -3.65, 2008: -17.67, 2009: 12.02, 2010: 19.68, 2011: 14.19, 2012: 16.67, 2013: 1.99, 2014: 2.38, 2015: 1.78, 2016: 0.76, 2017: 4.7, 2018: 3.5, 2019: -8.1, 2020: -8.0})
I need to identify periods (start and end year) during which growth
is negative for at least min_duration
consecutive years.
I can do this by iterating through the series:
def get_negative_periods(s, min_duration):
previous = 1
negative_periods = []
for year, value in s.items():
if value < 0:
if previous < 0:
negative_periods[-1].append(year)
else:
negative_periods.append([year])
previous = value
return [(period[0], period[-1]) for period in negative_periods
if len(period) >= min_duration]
e.g. get_negative_periods(growth, 3)
returns [(2002, 2008)]
because 2002-2008 is the only period where growth
was negative for 3 or more consecutive years.
Is there a way to vectorize this instead of going row by row? (Returning a series or dataframe instead of tuples would be fine.)
Try creating groups based on where True
and False
differ, then keep only True groups with year range greater than or equal to the min_duration
:
def get_negative_periods(s, min_duration):
s = s.lt(0).reset_index()
g = s[0].ne(s[0].shift()).cumsum()[s[0].eq(True)]
s = s.groupby(g)['index'].agg(['first', 'last'])
return s[(s['last'] - s['first']) + 1 >= min_duration]
res = get_negative_periods(growth, 3)
res
:
first last
0
6.0 2002 2008
Or as a list of lists:
def get_negative_periods(s, min_duration):
s = s.lt(0).reset_index()
g = s[0].ne(s[0].shift()).cumsum()[s[0].eq(True)]
s = s.groupby(g)['index'].agg(['first', 'last'])
return s[(s['last'] - s['first']) + 1 >= min_duration].values.tolist()
lst = get_negative_periods(growth, 3)
lst
:
[[2002, 2008]]