In a custom-developed NodeJS web server (running on Linux) that can dynamically generate thumbnail images, I want to cache these thumbnails on the filesystem and keep track of when they are actually used. If they haven't been used for a certain period of time (say, one year), I'd consider them "orphans" and delete them.
To this end, I considered to touch
them each time they're requested from a client, so that I can use the modification time to check when they were last used.
I assume this would incur a significant performance hit on the web server in high-load situations, as it is an "unnecessary" filesystem write, while, apart from logging, most requests will only consist of reads.
Has anyone performed any benchmarks on how big an impact this might have and if it's worthwhile?
It's probably not great, and probably worth avoiding updating every time you open a file. That's the reason the relatime
/ noatime
mount options were invented, to prevent the existing Unix access-time timestamp from being updated every time a file was opened.
Is your filesystem mounted with relatime
? That updates atime at most once per day, when the file is opened (even for reading). The other mount option that's common on Linux is noatime
: never update atime.
If you can't let the kernel do this for you without needing extra system calls, you might be better off making an fstat
system call after opening the file and only touching it to update the mod time if the mod time is older than a day or week. (You're concerned about intervals of a year, so a week is fine.) i.e. manually implement the relatime logic, but for mod time.
Frequently accessed files will not need updates (and you're still making a total of one system call for them, plus a date-compare). Rarely accessed files will need another system call and a metadata write. If most of the accesses in your access pattern are to a smallish set of files repeatedly, this should be excellent.
Possible reasons for not being able to use atime
could include:
Of course, the other option is to not update timestamps on use, and simply let a thumbnail be regenerated once a year after your weekly cron job deleted it. That might be ok depending on your workload.
If you manually touch some of the "hottest" thumbnails so you stagger their deletion, instead of having a big load spike this time next year, you could be ok. And/or have your deleter walk your filesystem very slowly, again so you don't have a big batch of frequently-needed thumbnails deleted at once.
You could come up with schemes like enabling mod-time updates in the week before the bi-annual cleanup, so thumbnails that should stay hot in cache get their modtimes updated. But probably better to just fstat / check / update all the time since that shouldn't be too much extra load.