Search code examples
pythonpandasfinance

Pandas resample creates new dates not in index when converting daily data to monthly data


import yfinance as yf
import pandas as pd 

data = yf.download('SPY', start='2017-12-31', end='2019-12-31') 
df = data[['Adj Close']] 

df.resample('2Q',closed='left').mean() 

The output from resample shows quarter-end dates 30-06-2018, 31-12-2018, 30-06-2019 and 31-12-2019. But some of those dates are not within original data. The average is calculated correctly.

These should be the correct dates 29-06-2018, 31-12-2018, 28-06-2019, 30-12-2019.

How can I use resample but output the correct dates that are in the original data?

f = {'Date': 'last', 'Adj Close': 'mean'}
df.reset_index().groupby([[d.year for d in df.index],[d.month==6 for d in 
df.index]],as_index=False).agg(f).sort_values('Date')

I used groupby, which outputs the correct dates (i.e. the last date of the quarter that is within the original data) but the average is calculated wrongly. Why is that?


Solution

  • a) df.resample('2Q', closed='left').mean() gives

                 Adj Close
    Date                  
    2018-06-30  260.626654
    2018-12-31  268.594670
    2019-06-30  273.702913
    2019-12-31  297.780020
    

    which is equivalent to

    pd.DataFrame({'Date': ['2018-06-30', '2018-12-31', '2019-06-30', '2019-12-31'],
                  'Adj Close': [df[(df.index <= '2018-06-29')].mean().values[0],
                                df[(df.index >  '2018-06-29') & (df.index <  '2018-12-31')].mean().values[0],
                                df[(df.index >= '2018-12-31') & (df.index <= '2019-06-28')].mean().values[0],
                                df[(df.index >  '2019-06-28')].mean().values[0]]})
    
             Date   Adj Close
    0  2018-06-30  260.626654
    1  2018-12-31  268.594670
    2  2019-06-30  273.702913
    3  2019-12-31  297.780020
    

    where the 31st December 2018 has been assigned to Q1 2019.

    b) df.resample('2Q', kind='period').mean() gives

             Adj Close
    Date              
    2018Q1  260.626654
    2018Q3  268.398217
    2019Q1  273.943730
    2019Q3  297.780020
    

    which is equivalent to

    pd.DataFrame({'Date': ['2018Q1', '2018Q3', '2019Q1', '2019Q3'],
                  'Adj Close': [df[(df.index <= '2018-06-29')].mean().values[0],
                                df[(df.index >  '2018-06-29') & (df.index <= '2018-12-31')].mean().values[0],
                                df[(df.index >  '2018-12-31') & (df.index <= '2019-06-28')].mean().values[0],
                                df[(df.index >  '2019-06-28')].mean().values[0]]})
    
         Date   Adj Close
    0  2018Q1  260.626654
    1  2018Q3  268.398217
    2  2019Q1  273.943730
    3  2019Q3  297.780020
    

    as well as to

    f = {'Date': 'last', 'Adj Close': 'mean'}
    df.reset_index().groupby([[d.year for d in df.index], [d.quarter <= 2 for d in df.index]], as_index=False).agg(f).sort_values('Date')
    
            Date   Adj Close
    1 2018-06-29  260.626654
    0 2018-12-31  268.398217
    3 2019-06-28  273.943730
    2 2019-12-30  297.780020
    

    where the 31st December 2018 has been assigned to Q4 2018.