Search code examples
pythonpython-3.xpandasdatetimereindex

Pandas Resample Upsample last date / edge of data


I'm trying to upsample weekly data to daily data, however, I'm having difficulty upsampling the last edge. How can I go about this?

import pandas as pd
import datetime
df = pd.DataFrame({
    'wk start': ['2018-08-12', '2018-08-12', '2018-08-19'], 
    'car': ['tesla model 3', 'tesla model x', 'tesla model 3'],
    'sales': [38000, 98000, 40000]})
df['wk start'] = df['wk start'].apply(
    lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
df.set_index('wk start').groupby('car').resample('D').pad()

This returns:

                             car            sales
car             wk start        
tesla model 3   2018-08-12  tesla model 3   38000
                2018-08-13  tesla model 3   38000
                2018-08-14  tesla model 3   38000
                2018-08-15  tesla model 3   38000
                2018-08-16  tesla model 3   38000
                2018-08-17  tesla model 3   38000
                2018-08-18  tesla model 3   38000
                2018-08-19  tesla model 3   40000

tesla model x   2018-08-12  tesla model x   98000

My desired output is:

                             car            sales
car             wk start        
tesla model 3   2018-08-12  tesla model 3   38000
                2018-08-13  tesla model 3   38000
                2018-08-14  tesla model 3   38000
                2018-08-15  tesla model 3   38000
                2018-08-16  tesla model 3   38000
                2018-08-17  tesla model 3   38000
                2018-08-18  tesla model 3   38000
                2018-08-19  tesla model 3   40000
                2018-08-20  tesla model 3   40000
                2018-08-21  tesla model 3   40000
                2018-08-22  tesla model 3   40000
                2018-08-23  tesla model 3   40000
                2018-08-24  tesla model 3   40000
                2018-08-25  tesla model 3   40000
tesla model x   2018-08-12  tesla model x   98000
                2018-08-13  tesla model x   98000
                2018-08-14  tesla model x   98000
                2018-08-15  tesla model x   98000
                2018-08-16  tesla model x   98000
                2018-08-17  tesla model x   98000
                2018-08-18  tesla model x   98000

I looked at this, but they're using periods and I'm looking at datetimes.


Solution

  • Yes, you are right, last edge data are excluded. Solution is add them to input DataFrame - my solution creates a helper Dataframe using drop_duplicates, adds 6 days and concat's to original df before using your solution:

    df1 = df.sort_values('wk start').drop_duplicates('car', keep='last').copy()
    df1['wk start'] = df1['wk start'] + pd.Timedelta(6, unit='d')
    
    df = pd.concat([df, df1], ignore_index=True)
    df = df.set_index('wk start').groupby('car').resample('D').pad()
    print (df)
                                        car  sales
    car           wk start                        
    tesla model 3 2018-08-12  tesla model 3  38000
                  2018-08-13  tesla model 3  38000
                  2018-08-14  tesla model 3  38000
                  2018-08-15  tesla model 3  38000
                  2018-08-16  tesla model 3  38000
                  2018-08-17  tesla model 3  38000
                  2018-08-18  tesla model 3  38000
                  2018-08-19  tesla model 3  40000
                  2018-08-20  tesla model 3  40000
                  2018-08-21  tesla model 3  40000
                  2018-08-22  tesla model 3  40000
                  2018-08-23  tesla model 3  40000
                  2018-08-24  tesla model 3  40000
                  2018-08-25  tesla model 3  40000
    tesla model x 2018-08-12  tesla model x  98000
                  2018-08-13  tesla model x  98000
                  2018-08-14  tesla model x  98000
                  2018-08-15  tesla model x  98000
                  2018-08-16  tesla model x  98000
                  2018-08-17  tesla model x  98000
                  2018-08-18  tesla model x  98000