Search code examples
pythonpandasdatetimeindex

Find gaps between pandas dateranges that overlap


I'm trying to find the gaps where a set of dateranges don't overlap a larger daterange. In the below example I'd like to find each gap where daterange_1 to daterange_3 don't overlap big_daterange.

At the moment I've gotten this far, but I'm stuck in extracting the gaps from diff, if that is even the best way to go about this.

import pandas as pd

daterange_1 = pd.date_range("2009-01-01", "2010-01-01")
daterange_2 = pd.date_range("2012-01-01", "2014-01-01")
daterange_3 = pd.date_range("2016-01-01", "2019-01-01")

big_daterange = pd.date_range("2001-01-01", "2021-01-01")

combined = daterange_1.union(daterange_2).union(daterange_3)

# Get the dates in big_daterange that aren't in combined
diff = big_daterange.difference(combined)

What I'm most interested in is how to extract the gaps (where a gap is a daterange of contiguous datetimes) and create a list of them like this:

gaps = [gap_daterange_1, gap_daterange_2, ...]

Solution

  • Try:

    s = diff.to_series()
    grp = s.diff().ne(pd.Timedelta(days=1)).cumsum()
    gaps = s.groupby(grp).agg(['min','max'])
    gaps
    

    Output:

             min        max
    1 2001-01-01 2008-12-31
    2 2010-01-02 2011-12-31
    3 2014-01-02 2015-12-31
    4 2019-01-02 2021-01-01
    

    Details:

    First, convert the DateTimeIndex to a pd.Series, using to_series.
    Next use diff to find the difference between two consective values. If diff is greater than one day, then mark this record as True. Use cumsum to create groups of records between True records. Lastly, groupby grp an use max and min to find the start and end of each group.