Search code examples
pythonperlshellopenbsdfilemtime

shell: save and restore mtime based on sha1 hash


I have a set of several thousand files that are automatically re-generated every 24-hours (e.g. ports-readmes on OpenBSD).

Most of the time, the content of these files doesn't change, but since they are re-created, the mtime does change.

Without modifying the original app which re-generates the files in place, how would I cache the mtime based on the filename/sha1 pairs, and restore after the regeneration if sha1 stays the same? Prefer python, but any UNIX solution is welcome.

(I require this for a sitemap, since the sitemap spec only has lastmod for versioning.)


Solution

  • It isn't clear precisely what help you require. Here are some places to start:

    • You can use os.walk, os.listdir or glob.glob to generate a list of files.
    • You can use os.stat to determine the last modified time.
    • You can use hashlib.md5(open(fname).read()).hexdigest() to get the md5.
    • You can use os.utime to set the modified time of a file.
    • You can use json.dump and json.load to persist the mtimes from one run to the next (there are other alternatives)