Identify periods in a pandas series where several consecutive values are negative

Given a pandas series with years as indices (in ascending order with no missing years):

growth = pd.Series({1990: 6.99, 1991: 5.53, 1992: -9.02, 1993: 1.05, 1994: 9.24, 1995: 0.16, 1996: 10.36, 1997: 2.68, 1998: 2.89, 1999: -0.82, 2000: -3.06, 2001: 1.44, 2002: -8.89, 2003: -17.0, 2004: -5.81, 2005: -5.71, 2006: -3.46, 2007: -3.65, 2008: -17.67, 2009: 12.02, 2010: 19.68, 2011: 14.19, 2012: 16.67, 2013: 1.99, 2014: 2.38, 2015: 1.78, 2016: 0.76, 2017: 4.7, 2018: 3.5, 2019: -8.1, 2020: -8.0})

I need to identify periods (start and end year) during which growth is negative for at least min_duration consecutive years.

I can do this by iterating through the series:

def get_negative_periods(s, min_duration):
    previous = 1
    negative_periods = []
    for year, value in s.items():
        if value < 0:
            if previous < 0:
                negative_periods[-1].append(year)
            else:
                negative_periods.append([year])
        previous = value
    return [(period[0], period[-1]) for period in negative_periods
        if len(period) >= min_duration]

e.g. get_negative_periods(growth, 3) returns [(2002, 2008)] because 2002-2008 is the only period where growth was negative for 3 or more consecutive years.

Is there a way to vectorize this instead of going row by row? (Returning a series or dataframe instead of tuples would be fine.)

Solution

Try creating groups based on where True and False differ, then keep only True groups with year range greater than or equal to the min_duration:

def get_negative_periods(s, min_duration):
    s = s.lt(0).reset_index()
    g = s[0].ne(s[0].shift()).cumsum()[s[0].eq(True)]
    s = s.groupby(g)['index'].agg(['first', 'last'])
    return s[(s['last'] - s['first']) + 1 >= min_duration]


res = get_negative_periods(growth, 3)

res:

     first  last
0               
6.0   2002  2008

Or as a list of lists:

def get_negative_periods(s, min_duration):
    s = s.lt(0).reset_index()
    g = s[0].ne(s[0].shift()).cumsum()[s[0].eq(True)]
    s = s.groupby(g)['index'].agg(['first', 'last'])
    return s[(s['last'] - s['first']) + 1 >= min_duration].values.tolist()


lst = get_negative_periods(growth, 3)

lst:

[[2002, 2008]]