I have a directory that contains subdirectories, and each of those contains a good number of files. Each such directory contains at least 100k total files within its subdirectories, with each subdirectory directly containing about 150 files each. I want to show the total count of files within the directory, including all its subdirectories' file counts. While I can do this by using the .NET APIs in System.IO
, is there a better way?
Using just .NET's standard library, the fastest option is
int count = Directory.EnumerateFiles(...).Count();
However, there's other posts asking about listing directories and subdirectories fast, and through research I came up with a faster implementation, in this repository (NuGet).
DISCLAIMER: The author of the linked repository is myself. As also mentioned in the README, the code was taken by this post, and the package is being released under the CPOL license as was the original code of the post's author, wilsone8. I have only tested this in local drives. The author of the original post mentioned using their approach for files over network, and the final code uses the same APIs for enumerating the files, so there's high chance this works for network files too.
The package provides a method to get the file count, which you can use like so:
// ensure the using:
using FastDiskIO;
var count = FastDirectoryEnumeration.GetFileCount(
"dir/path/",
searchOption: SearchOption.AllDirectories);
Example for 300k files spread within 300 subdirectories:
Method | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|
GetFileCount | 111.1 ms | 0.95 | 99.25 KB | 0.004 |
Directory_EnumerateFiles | 116.6 ms | 1.00 | 25741.13 KB | 1.000 |
The allocations are avoided by reusing the same struct when invoking the Windows APIs, and only allocating strings that are required for calculating the path of the subdirectories to iterate next.
And for the above example, it seems that we have hit the API/IO bottleneck, so it can probably be barely improved. The major impact is the allocation reduction though, which is huge for much denser and more packed directories.