Search code examples
pythondataframedata-manipulation

How to get the union of serveral time intervals for each id and name?


I want to know how to get the union of serveral time intervals for each id and name?

import pandas as pd
id = [1,1,1,1,1,1,1,2,2,2]
name = ['A','A','A','A','A','B','B','C','C','C']
Start_time = ['2005-06-27','2005-07-07','2005-07-12','2006-11-15','2008-08-22','2009-03-03','2009-03-06','2007-10-26','2007-10-31','2007-11-06']
Final_time = ['2005-07-07','2005-07-12','2005-09-26','2008-08-22','2009-02-24','2009-03-06','2009-03-12','2007-10-31','2007-11-05','2007-11-09']
dataframe = pd.DataFrame({'id':id,'name':name,'Start_time':Start_time,'Final_time':Final_time})
dataframe['Start_time'] = pd.to_datetime(dataframe['Start_time'])
dataframe['Final_time'] = pd.to_datetime(dataframe['Final_time'])

The result may be like: result

If the time intervals can be merged for each id and name, then the related result_S and result_F should be the same,just like the image shows.


Solution

  • You can accomplish that with combination of mask, backward fill and forward fill methods in pandas. Example code below:

    df_s = dataframe[1:]
    df_f = dataframe[:-1]
    
    conds = (df_s.Start_time.values == df_f.Final_time.values) & (df_s.name.values == df_f.name.values) & (df_s.id.values == df_f.id.values)
    
    conds_s = [False] + list(conds)
    conds_f = list(conds) + [False]
    
    dataframe['Result_S'] = dataframe['Start_time'].mask(conds_s).ffill()
    dataframe['Result_F'] = dataframe['Final_time'].mask(conds_f).bfill()
    

    Note that in the solution above we assume that dataframe is already sorted.