I have a set of several thousand files that are automatically re-generated every 24-hours (e.g. ports-readmes on OpenBSD).
Most of the time, the content of these files doesn't change, but since they are re-created, the mtime
does change.
Without modifying the original app which re-generates the files in place, how would I cache the mtime based on the filename/sha1 pairs, and restore after the regeneration if sha1 stays the same? Prefer python
, but any UNIX solution is welcome.
(I require this for a sitemap, since the sitemap spec only has lastmod
for versioning.)
It isn't clear precisely what help you require. Here are some places to start:
os.walk
, os.listdir
or glob.glob
to generate a list of files.os.stat
to determine the last modified time.hashlib.md5(open(fname).read()).hexdigest()
to get the md5.os.utime
to set the modified time of a file. json.dump
and json.load
to persist the mtimes from one run to the next (there are other alternatives)