I'm trying to find the gaps where a set of dateranges don't overlap a larger daterange. In the below example I'd like to find each gap where daterange_1
to daterange_3
don't overlap big_daterange
.
At the moment I've gotten this far, but I'm stuck in extracting the gaps from diff
, if that is even the best way to go about this.
import pandas as pd
daterange_1 = pd.date_range("2009-01-01", "2010-01-01")
daterange_2 = pd.date_range("2012-01-01", "2014-01-01")
daterange_3 = pd.date_range("2016-01-01", "2019-01-01")
big_daterange = pd.date_range("2001-01-01", "2021-01-01")
combined = daterange_1.union(daterange_2).union(daterange_3)
# Get the dates in big_daterange that aren't in combined
diff = big_daterange.difference(combined)
What I'm most interested in is how to extract the gaps (where a gap is a daterange of contiguous datetimes) and create a list of them like this:
gaps = [gap_daterange_1, gap_daterange_2, ...]
Try:
s = diff.to_series()
grp = s.diff().ne(pd.Timedelta(days=1)).cumsum()
gaps = s.groupby(grp).agg(['min','max'])
gaps
Output:
min max
1 2001-01-01 2008-12-31
2 2010-01-02 2011-12-31
3 2014-01-02 2015-12-31
4 2019-01-02 2021-01-01
Details:
First, convert the DateTimeIndex to a pd.Series, using to_series
.
Next use diff
to find the difference between two consective values.
If diff is greater than one day, then mark this record as True. Use cumsum
to create groups of records between True records. Lastly, groupby
grp an use max and min to find the start and end of each group.