Search code examples
pythonlinuxconcatenation

Python files reading and writing - script


I am writing a script in python in order to navigate to a folder on my desktop, read files (using glob patterns as I will be adding files everyday or so) and copying their content in one separate .txt file.

I wrote the below script:

#!/usr/bin/env python3
with open('../python_diary.txt', 'w') as outfile:
    for filename in glob.glob('../Desktop/diary/*-2020.txt'):
        with open(filename) as infile:
            for line in infile:
                outfile.write(line)

The script generally works fine, but my files are in a dd-mm-yyyy format and when launching the script they appear in my destination file in the following order(up to today): 19-06-2020 17-06-2020 16-06-2020 18-06-2020

Any idea how I can make these concatenated files appear from oldest to newest?

Thanks,


Solution

  • You can perform a sort on the glob with a few tricks to get to the datetime. Assuming your timestamps are all zero-padded months and days with a 4-digit year, this will work for you:

    import os
    from glob import glob
    
    # Grab the filenames matching this glob
    filenames = glob.glob('../Desktop/diary/*-2020.txt')
    # Sort the filenames by ascending date
    def filename_to_isodate(filename):
        date = os.path.basename(filename).rsplit('.', 1)[0][-10:]
        return date[-4:] + date[3:5] + date[:2]
    
    filenames = sorted(filenames, key=filename_to_isodate)
    for filename in filenames:
        ...  # Your stuff here...
    

    Explanation os.path.basename gives us the name of the file, e.g., '../Desktop/diary/01-01-2020.txt' becomes '01-01-2020.txt'

    rsplit('.', 1)[0][-:10] splits the basename by the period, effectively stripping the extension, and only grabbing what is before the extension. The [-10:] only grabs the 10 characters that make up a date, in this case, 4 for the year + 2 for the month + 2 for the day + 2 dashes = 10 characters.

    Last, in the sorting, we use sorted with the key to tell the function to sort by ISO date (year, month, day).


    edit: following input from @Daniel F, the strptime from the datetime module is replaced by simply using the date in ISO string format in sorting for speed purposes. Below was the original method used in this answer.

    The built-in datetime module can be used to parse the datetime by a given format, in this case: %d-%m-%Y. strptime gives a datetime object that can be treated numerically, meaning that it can be compared and thus sorted. os.path.basename(s).rsplit('.', 1)[0][-10:], '%d-%m-%Y'