Search code examples
pythondirectorylookupfilepathdisk

Is file reading faster in nested folders or doesn't matter?


My question concerns purely file-path reading and Disk Drives...I think.

I have python code that needs to pull up a specific file of which I know exactly the file-path to. And I have a choice either I store that file in a large folder with thousands of other files or segment them all into sub-folders. Which choice would give more reading speed?

My concern and lack of knowledge suggests that when code enters a big folder with thousands of other files then that is much more of a struggle than entering a folder with a few sub-folders. Or am I wrong and it is all instant if I produce the exact file-path?

Again I don't have to scan the files or folders as I know exactly the file-path link but I don't know what happens on the lower-level with Disk Drives?

EDIT: Which of the two would be faster given standard HDD on Windows 7?

C://Folder_with_millions_of_files/myfile.txt

or

C://small_folder/small_folder254/small_folder323/myfile.txt

NOTE: What I need this for is not to scan thousands of files but to pull up just that one file as quickly as possible. Sort of a lookup table I think this is.


Solution

  • Doing some reading for maximum scalability it appears best practice is to split the folder to subfolders although you nesting multiple folders is not recomended and it is best to use multiple larger folders than thousands of smaller folders,

    Rather than shoveling all of those files into a single filesystem, why not spread them out across a series of smaller filesystems? The problems with that approach are that (1) it limits the kernel's ability to optimize head seeks and such, reducing performance, and (2) it forces developers (or administrators) to deal with the hassles involved in actually distributing the files. Inevitably things will get out of balance, forcing things to be redistributed in the future.

    From looking at these articles I would draw the following conclusion,
    < 65,534 files (One folder should suffice)
    > 65,534 files (Split into folders)

    To allow for scalability in the future it would be advisable to split the data across folders but based on the file system and observed performance potentially creating a new folder per 65,534 items or per day, category etc.

    Based on,
    single folder or many folders for storing 8 million images of hundreds of stores?
    https://lwn.net/Articles/400629/
    https://superuser.com/questions/446282/max-files-per-directory-on-ntfs-vol-vs-fat32