Search code examples
pythonpandasdataframedownsampling

Down-sampling specific period on dataframe using Pandas


I have a long time serie that starts in 1963 and ends in 2013. However, from 1963 til 2007 it has an hourly sampling period while after 2007's sampling rate changes to 5 minutes. Is it possible to resample data just after 2007 in a way that the entire time serie has hourly data sampling? Data slice below.

yr, m, d, h, m, s, sl
2007, 11, 30, 19, 0, 0, 2180
2007, 11, 30, 20, 0, 0, 2310
2007, 11, 30, 21, 0, 0, 2400
2007, 11, 30, 22, 0, 0, 2400
2007, 11, 30, 23, 0, 0, 2270
2008, 1, 1, 0, 0, 0, 2210
2008, 1, 1, 0, 5, 0, 2210
2008, 1, 1, 0, 10, 0, 2210
2008, 1, 1, 0, 15, 0, 2200
2008, 1, 1, 0, 20, 0, 2200
2008, 1, 1, 0, 25, 0, 2200
2008, 1, 1, 0, 30, 0, 2200
2008, 1, 1, 0, 35, 0, 2200
2008, 1, 1, 0, 40, 0, 2200
2008, 1, 1, 0, 45, 0, 2200
2008, 1, 1, 0, 50, 0, 2200
2008, 1, 1, 0, 55, 0, 2200
2008, 1, 1, 1, 0, 0, 2190
2008, 1, 1, 1, 5, 0, 2190  

Thanks!


Solution

  • Give your dataframe proper column names

    df.columns = 'year month day hour minute second sl'.split()
    

    Solution

    df.groupby(['year', 'month', 'day', 'hour'], as_index=False).first()
    
       year  month  day  hour  minute  second    sl
    0  2007     11   30    19       0       0  2180
    1  2007     11   30    20       0       0  2310
    2  2007     11   30    21       0       0  2400
    3  2007     11   30    22       0       0  2400
    4  2007     11   30    23       0       0  2270
    5  2008      1    1     0       0       0  2210
    6  2008      1    1     1       0       0  2190
    

    Option 2
    Here is an option that builds off of the column renaming. We'll use pd.to_datetime to cleverly get at our dates, then use resample. However, you have time gaps and will have to address nulls and re-cast dtypes.

    df.set_index(
        pd.to_datetime(df.drop('sl', 1))
    ).resample('H').first().dropna().astype(df.dtypes)
    
                         year  month  day  hour  minute  second    sl
    2007-11-30 19:00:00  2007     11   30    19       0       0  2180
    2007-11-30 20:00:00  2007     11   30    20       0       0  2310
    2007-11-30 21:00:00  2007     11   30    21       0       0  2400
    2007-11-30 22:00:00  2007     11   30    22       0       0  2400
    2007-11-30 23:00:00  2007     11   30    23       0       0  2270
    2008-01-01 00:00:00  2008      1    1     0       0       0  2210
    2008-01-01 01:00:00  2008      1    1     1       0       0  2190