I have a folder with csv files, which names indicate date and hour when one boy comes at home every day during summer holidays: for instance andrew201507011700.csv
tells me that he comes at home the first July at 17:00. So my goal is to sort the files in the folder and then extract the timestampes , indicated in the filenames.
for example for files in folder:
andrew201509030515.csv
andrew201507011700.csv
andrew201506021930.csv
andrew201508110000.csv
I'de like to sort them, based on these timestamps:
andrew201506021930.csv
andrew201507011700.csv
andrew201508110000.csv
andrew201509030515.csv
and then, iterating over this sorted list of files,extract the timestamp as a columns for every inner dataframe, for example for file andrew201506021930.csv
obtain a column with some basic native python datetime format:
datetime
2015:06:02:19:30
I tried the following method, firstly to split the filename and sort based on numerical values, and than to get 12 last characters of its basename:
path_sort=sorted(os.listdir(path),key=lambda x: int(x.split('w')[0]))
for i in path_sort:
fi=os.path.join(path_sort, i)
return os.path.basename(fi)[-12:]
It seems to me wrong, I don't use any datetime method for sorting the files, moreover it throws me an error already for this line fi=os.path.join(path_sort, i)
AttributeError: 'list' object has no attribute 'endswith'
Try this: (maybe cleanup the regex a bit more if you're not sure all your filenames have the same format)
from os import listdir
from os.path import isfile, join
import re
def extract_number(string):
r = re.compile(r'(\d+)')
return int(r.findall(string)[0])
MyDir = 'exampls/'
onlyfiles = [f for f in listdir(MyDir) if isfile(join(MyDir, f))]
sortedFiles = sorted(onlyfiles ,key=lambda x: extract_number(x) )