Search code examples
pythonpandasperiod

asfreq yields unexpected results with Period dtype


When upsampling a Dataframe, I would to like that new rows created are left empty.

Considering following code:

import pandas as pd

p5h = pd.period_range(start='2020-02-01 00:00', end='2020-03-04 00:00', freq='5h', name='p5h')

df = pd.DataFrame({'Values' : 1}, index=p5h)

I would like to upsample to '1H' frequency, leaving new rows filled with NaN values.

import numpy as np

df1h = df.asfreq('1H', method=None, how='start', fill_value = np.NaN)

But here is what I get:

 df1h.head(7)

                   Values
 p5h                     
 2020-02-01 00:00       1
 2020-02-01 05:00       1
 2020-02-01 10:00       1
 2020-02-01 15:00       1
 2020-02-01 20:00       1
 2020-02-02 01:00       1
 2020-02-02 06:00       1

(need for that is then to merge/join/concat this DataFrame to another one having a '1H' PeriodIndex - this merging operation cannot be achieved if PeriodIndex of both DataFrames do not share the same frequency)

Thanks for any help! Bests


Solution

  • asfreq() is indeed a method for Period dtypes. Note that your index has dtype:

    df.index.dtype
    # period[5H]
    

    However, its functionality is slightly different, and it only takes these two parameters:

    • freqstr The desired frequency.

    • how {‘E’, ‘S’, ‘end’, ‘start’}, default ‘end’ Start or end of the timespan.


    What could be done to handle the Period index dtype is to use resample and just aggregate with first:

    df.resample('1H').first()
    
                       Values
    p5h                     
    2020-02-01 00:00     1.0
    2020-02-01 01:00     NaN
    2020-02-01 02:00     NaN
    2020-02-01 03:00     NaN
    2020-02-01 04:00     NaN
    ...                  ...
    2020-03-03 21:00     1.0
    2020-03-03 22:00     NaN
    2020-03-03 23:00     NaN
    2020-03-04 00:00     NaN
    2020-03-04 01:00     NaN
    

    Though if you instead defined the index using pd.date_range you would get as expected:

    p5h = pd.date_range(start='2020-02-01 00:00', end='2020-03-04 00:00', 
                        freq='5h', name='p5h')
    df = pd.DataFrame({'Values' : 1}, index=p5h)
    
    df.asfreq('1H')
    
                          Values
    p5h                        
    2020-02-01 00:00:00     1.0
    2020-02-01 01:00:00     NaN
    2020-02-01 02:00:00     NaN
    2020-02-01 03:00:00     NaN
    2020-02-01 04:00:00     NaN
    ...                     ...
    2020-03-03 17:00:00     NaN
    2020-03-03 18:00:00     NaN
    2020-03-03 19:00:00     NaN
    2020-03-03 20:00:00     NaN
    2020-03-03 21:00:00     1.0