Search code examples
windowsperformancefilesystemsntfs

Does the length of a file name impact huge NTFS folder indexes?


I have NTFS folders that may grow to hold 100,000 to 1,000,000 files, the upper limit discussed in this answer on NTFS performance.

My files have the following characteristics:

1) They have long file names (typically 64 to 100 characters).

2) For many of the files, the leading part of the file names can be identical for the first 20 to 40 characters.

Do long file names impact NTFS folder index performance, in either looking up a file's record from its name, fragmentation of the index, or growth of the index?

NTFS folder indexes are (reportedly) B-trees. I've tested my software to 50,000 files, but I'm running a 'happy path' test, with little file system churn. Testing to 1,000,000 will take weeks of running my software non-stop.

I've considered writing a simulator, but before I do that, does anyone have real-world experience with this?


Solution

  • NTFS directories are BTrees with data in both the interior and leaf nodes. Since there isn't any "key prefix compression", the full text of the filename is stored in the nodes as well.

    Searching this with test filenames that have lots of identical prefix characters simply wastes time since looking through each "page" of the directory does a bunch of identical comparisons before encountering the distinguishing characters. If you can make the leftmost character in the name the most variable, that'd be a huge help.

    But, in the end, no filesystem is a good database and no database is a good filesystem. You need to consider the sizes of your files and expected usage characteristics.