Search code examples
file-storage

Why sort automatically-generated files based on hash?


It's a pattern I've seen on websites that allow users to upload content like images before.

For example, why http://upload.wikimedia.org/wikipedia/commons/7/70/Example.png instead of just something like http://upload.wikimedia.org/wikipedia/commons/Example.png?

Is there a practical reason for this, or is it just cargo-cult?


Solution

  • Many filesystems don't perform very well when there are hundreds of thousands of files in the same directory - it takes a long time to look in the directory for a file.

    To avoid this problem, the files are distributed into a folder hierarchy. In order to get an even distribution, you hash the filename or contents - something that identifies the file - and use parts of that hash to determine what folder the file should be placed in. That's where the 7/70 comes from: it's derived from the prefix of the hash in two steps, creating a two-level hierarchy. Files are therefore distributed over 256 different folders, meaning you have much fewer files in each folder, which in turn gives better filesystem performance.