How can I find only the files that contains date for last day of the month, using Python?

I have a folder with hundreds of .xlsx files which are daily data from years ago.

The file name pattern is 'ABC DEF 20150501.xlsx'

How can I pick files from the last day of each months for all these years.

Sadly I have no clue how to do it.

import pandas as pd
import re

files = [f for f in os.listdir(path) if re.match(r'ABC DEF [0-9]+.xlsx', f)]
df = pd.DataFrame(files)
df.columns = ['file_name']
df['port_date'] = df['file_name'].apply(lambda x : x.split(' ')[3].split('.')[0])

I got only the date in one column, and run out of idea!

Any help will be appreciated.

Solution

If need last row for each month and year extract datetimes, sorting and gruping by years and months with GroupBy.last:

df = pd.DataFrame({'file_name':['ABC DEF 20150501.xlsx',
                                'ABC DEF 20150701.xlsx',
                                'ABC DEF 20150711.xlsx']})

print (df)
               file_name
0  ABC DEF 20150501.xlsx
1  ABC DEF 20150701.xlsx
2  ABC DEF 20150711.xlsx

df['port_date'] = pd.to_datetime(df['file_name'].str.extract('(\d+)\.', expand=False))
df = df.sort_values('port_date')

df = (df.groupby([df['port_date'].dt.year, df['port_date'].dt.month])
        .last()
        .reset_index(drop=True))
print (df)
               file_name  port_date
0  ABC DEF 20150501.xlsx 2015-05-01
1  ABC DEF 20150711.xlsx 2015-07-11