Problem: I have a data set of daily gold prices. I want to take the mean in each month and done this by resampling the data. Next, I want to change the date of each month to end of each business month, I have done this by resampling the new mean data and resampled again. The code is as below.
gold_df = gold_df.resample(rule='M').mean()
gold_df = gold_df.resample(rule='BM').last()
>>> gold_df
Date
1979-01-31 227.215217
1979-02-28 245.670000
1979-03-30 NaN
1979-04-30 238.664286
1979-05-31 257.782609
Issue: However, I get NaN in some of the months. Shouldn't resampling method just change the dates in this case, as there is only one value in each month (due to the first resample step). I solved this by doing the below, but am confused in understanding the intuition behind resample() and why I need to first resample in 'BMS' frequency and then 'BM' again.
gold_df = gold_df.resample(rule='M').mean()
gold_df = gold_df.resample(rule='BMS').last()
gold_df = gold_df.resample(rule='BM').last()
>>> gold_df
Date
1978-12-29 226.000000
1979-01-31 227.215217
1979-02-28 245.670000
1979-03-30 242.047727
1979-04-30 238.664286
This has nothing to do with missing values in your data. NaN
occurs here when the last day of the month is not a business day.
Look at the following example:
idx = pd.date_range('2020-01-01','2020-12-31')
s = pd.Series(pd.np.random.randint(20, 30, len(idx)), index=idx)
s.resample('M').mean().resample('BM').last().to_frame().assign(day=s.resample('M').mean().index.day_name())
# 0 day
#2020-01-31 24.612903 Friday
#2020-02-28 NaN Saturday
#2020-03-31 24.096774 Tuesday
#2020-04-30 24.433333 Thursday
#2020-05-29 NaN Sunday
#2020-06-30 23.800000 Tuesday
#2020-07-31 23.677419 Friday
#2020-08-31 23.870968 Monday
#2020-09-30 25.333333 Wednesday
#2020-10-30 NaN Saturday
#2020-11-30 24.266667 Monday
#2020-12-31 23.806452 Thursday
You can circumvent this by setting label
and closed
to left
for 'BM' (default is right):
s.resample('M').mean().resample('BM', label='left', closed='left').last()
#2020-01-31 24.612903
#2020-02-28 23.896552
#2020-03-31 24.096774
#2020-04-30 24.433333
#2020-05-29 24.064516
#2020-06-30 23.800000
#2020-07-31 23.677419
#2020-08-31 23.870968
#2020-09-30 25.333333
#2020-10-30 24.838710
#2020-11-30 24.266667
#2020-12-31 23.806452
#Freq: BM, dtype: float64
assert all(s.resample('M').mean().resample('BM', label='left', closed='left').last() == s.resample('M').mean().resample('BMS').last().resample('BM').last())