Search code examples
pythonsortingparsingdatetimefilenames

sort files and parse filenames in python


I have a folder with csv files, which names indicate date and hour when one boy comes at home every day during summer holidays: for instance andrew201507011700.csv tells me that he comes at home the first July at 17:00. So my goal is to sort the files in the folder and then extract the timestampes , indicated in the filenames.

for example for files in folder:

andrew201509030515.csv
andrew201507011700.csv
andrew201506021930.csv
andrew201508110000.csv

I'de like to sort them, based on these timestamps:

andrew201506021930.csv
andrew201507011700.csv
andrew201508110000.csv
andrew201509030515.csv

and then, iterating over this sorted list of files,extract the timestamp as a columns for every inner dataframe, for example for file andrew201506021930.csv obtain a column with some basic native python datetime format:

datetime
2015:06:02:19:30

I tried the following method, firstly to split the filename and sort based on numerical values, and than to get 12 last characters of its basename:

path_sort=sorted(os.listdir(path),key=lambda x: int(x.split('w')[0]))
for i in path_sort:
    fi=os.path.join(path_sort, i)
    return os.path.basename(fi)[-12:]

It seems to me wrong, I don't use any datetime method for sorting the files, moreover it throws me an error already for this line fi=os.path.join(path_sort, i)

AttributeError: 'list' object has no attribute 'endswith'


Solution

  • Try this: (maybe cleanup the regex a bit more if you're not sure all your filenames have the same format)

    from os import listdir
    from os.path import isfile, join
    import re
    
    def extract_number(string):
        r = re.compile(r'(\d+)')
        return int(r.findall(string)[0])
    
    MyDir = 'exampls/'
    onlyfiles = [f for f in listdir(MyDir) if isfile(join(MyDir, f))]
    sortedFiles = sorted(onlyfiles ,key=lambda x: extract_number(x) )