Search code examples
cachingdirectory-structure

Cache directory structure


I'm in the process of implementing caching for my project. After looking at cache directory structures, I've seen many examples like:

cache
cache/a
cache/a/a/
cache/a/...
cache/a/z
cache/...
cache/z
...

You get the idea. Another example for storing files, let's say our file is named IMG_PARTY.JPG, a common way is to put it in a directory named:

files/i/m/IMG_PARTY.JPG

Some thoughts come to mind, but I'd like to know the real reasons for this.

  • Filesystems doing linear lookups find files faster when there's fewer of them in a directory. Such structure spreads files thin.

  • To not mess up *nix utilities like rm, which take a finite number of arguments and deleting large number of files at once tends to be hacky (having to pass it though find etc.)

What's the real reason? What is a "good" cache directory structure and why?


Solution

  • Every time I've done it, it has been to avoid slow linear searches in filesystems. Luckily, at least on Linux, this is becoming a thing of the past.

    However, even today, with b-tree based directories, a very large directory will be hard to deal with, since it will take forever and a day just to get a listing of all the files, never mind finding the right file.