When upsampling a Dataframe, I would to like that new rows created are left empty.
Considering following code:
import pandas as pd
p5h = pd.period_range(start='2020-02-01 00:00', end='2020-03-04 00:00', freq='5h', name='p5h')
df = pd.DataFrame({'Values' : 1}, index=p5h)
I would like to upsample to '1H' frequency, leaving new rows filled with NaN values.
import numpy as np
df1h = df.asfreq('1H', method=None, how='start', fill_value = np.NaN)
But here is what I get:
df1h.head(7)
Values
p5h
2020-02-01 00:00 1
2020-02-01 05:00 1
2020-02-01 10:00 1
2020-02-01 15:00 1
2020-02-01 20:00 1
2020-02-02 01:00 1
2020-02-02 06:00 1
(need for that is then to merge/join/concat this DataFrame to another one having a '1H' PeriodIndex - this merging operation cannot be achieved if PeriodIndex of both DataFrames do not share the same frequency)
Thanks for any help! Bests
asfreq()
is indeed a method for Period
dtypes. Note that your index has dtype:
df.index.dtype
# period[5H]
However, its functionality is slightly different, and it only takes these two parameters:
freqstr The desired frequency.
how {‘E’, ‘S’, ‘end’, ‘start’}, default ‘end’ Start or end of the timespan.
What could be done to handle the Period
index dtype is to use resample
and just aggregate with first
:
df.resample('1H').first()
Values
p5h
2020-02-01 00:00 1.0
2020-02-01 01:00 NaN
2020-02-01 02:00 NaN
2020-02-01 03:00 NaN
2020-02-01 04:00 NaN
... ...
2020-03-03 21:00 1.0
2020-03-03 22:00 NaN
2020-03-03 23:00 NaN
2020-03-04 00:00 NaN
2020-03-04 01:00 NaN
Though if you instead defined the index using pd.date_range
you would get as expected:
p5h = pd.date_range(start='2020-02-01 00:00', end='2020-03-04 00:00',
freq='5h', name='p5h')
df = pd.DataFrame({'Values' : 1}, index=p5h)
df.asfreq('1H')
Values
p5h
2020-02-01 00:00:00 1.0
2020-02-01 01:00:00 NaN
2020-02-01 02:00:00 NaN
2020-02-01 03:00:00 NaN
2020-02-01 04:00:00 NaN
... ...
2020-03-03 17:00:00 NaN
2020-03-03 18:00:00 NaN
2020-03-03 19:00:00 NaN
2020-03-03 20:00:00 NaN
2020-03-03 21:00:00 1.0