Search code examples
pythonpandasoverlapping-matches

How to use Pandas to keep most recent data from Series with overlapping times


I have multiple pandas series, and they have some overlapping times:

In [1]: import pandas as pd

In [2]: cycle_00z = pd.Series(data=[10, 10, 10, 10], 
           index=pd.date_range('2015-01-01 00', '2015-01-01 03', freq='H'))
In [3]: cycle_02z = pd.Series(data=[20, 20, 20, 20], 
           index=pd.date_range('2015-01-01 02', '2015-01-01 05', freq='H'))
In [4]: cycle_04z = pd.Series(data=[30, 30, 30, 30], 
           index=pd.date_range('2015-01-01 04', '2015-01-01 07', freq='H'))

In [5]: cycle_00z
Out[5]: 
2015-01-01 00:00:00    10
2015-01-01 01:00:00    10
2015-01-01 02:00:00    10
2015-01-01 03:00:00    10
Freq: H, dtype: int64

In [6]: cycle_02z
Out[6]: 
2015-01-01 02:00:00    20
2015-01-01 03:00:00    20
2015-01-01 04:00:00    20
2015-01-01 05:00:00    20
Freq: H, dtype: int64

In [7]: cycle_04z
Out[7]: 
2015-01-01 04:00:00    30
2015-01-01 05:00:00    30
2015-01-01 06:00:00    30
2015-01-01 07:00:00    30
Freq: H, dtype: int64

I would like to create another pandas series from these three, which will contain the unique times from these three cycles and the most recent data (when the times overlap). In this case it would look like this:

In [8]: continuous = pd.Series(data=[10, 10, 20, 20, 30, 30, 30, 30], 
             index=pd.date_range('2015-01-01 00', '2015-01-01 07', freq='H'))

In [9]: continuous
Out[21]: 
2015-01-01 00:00:00    10
2015-01-01 01:00:00    10
2015-01-01 02:00:00    20
2015-01-01 03:00:00    20
2015-01-01 04:00:00    30
2015-01-01 05:00:00    30
2015-01-01 06:00:00    30
2015-01-01 07:00:00    30
Freq: H, dtype: int64

Just wondering if there would be a neat way to achieve that using pandas please? I will actually need to implement the technique in xray DataArrays but I guess the idea would be the same. Essentially, always keep data from most recent cycles.

Thanks


Solution

  • One way is to use the combine_first method:

    In [39]: cycle_04z.combine_first(cycle_02z).combine_first(cycle_00z)
    Out[39]: 
    2015-01-01 00:00:00    10
    2015-01-01 01:00:00    10
    2015-01-01 02:00:00    20
    2015-01-01 03:00:00    20
    2015-01-01 04:00:00    30
    2015-01-01 05:00:00    30
    2015-01-01 06:00:00    30
    2015-01-01 07:00:00    30
    Freq: H, dtype: float64
    

    Or, if you are performing the updates in a loop, something like this would work:

    In [40]: result = cycle_00z
    
    In [41]: result = cycle_02z.combine_first(result)
    
    In [42]: result = cycle_04z.combine_first(result)
    
    In [43]: result
    Out[43]: 
    2015-01-01 00:00:00    10
    2015-01-01 01:00:00    10
    2015-01-01 02:00:00    20
    2015-01-01 03:00:00    20
    2015-01-01 04:00:00    30
    2015-01-01 05:00:00    30
    2015-01-01 06:00:00    30
    2015-01-01 07:00:00    30
    Freq: H, dtype: float64