Search code examples
pythonpandasresampling

Resampling data to show only data as of the last date of the month


I have data similar to:

Date A B
1/5/22 4 5
1/6/22 8 3
1/8/22 1 5
2/10/22 4 8
2/28/22 6 4
3/15/22 6 0
3/21/22 0 0
3/25/22 5 7

I want to retain only rows with the last date of the month, similar to below:

Date A B
1/8/22 1 5
2/28/22 6 4
3/25/22 5 7

I tried using the following code:

df.resample("M").last()

However, I get the following error: 'DataFrame' object has no attribute 'to_datetime'

But my index is set as Date column which is passed through the datetime function as below:

df['Date'] = df['Date'].apply(lambda x: str(x))
df['Date'] = pd.to_datetime(df['Date']).dt.date
df.set_index('Date', inplace=True)

Solution

  • You can use a monthly period (convert to_datetime and to_period) and groupby.idxmax:

    # convert to datetime
    date = pd.to_datetime(df['Date'], dayfirst=False)
    
    # get index of last day per monthly period
    out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]
    

    output:

          Date  A  B
    2   1/8/22  1  5
    4  2/28/22  6  4
    7  3/25/22  5  7
    

    NB. if you expect more than one "last date" row per month, use:

    out = df.loc[date.isin(date.groupby(date.dt.to_period('M')).max())]
    

    if Date is the index

    date = pd.to_datetime(df.index.to_series(), dayfirst=False)
    
    out = df.loc[date.groupby(date.dt.to_period('M')).idxmax()]