Search code examples
cposixglob

Is glob (using a unique prefix) faster than readdir?


There is a directory with many files and I need to open the file named like 00343dde41bac11ef7020935ee3d*. I suspect that there is exactly one such file.

I know that accessing one file fopen (3) is faster than reading the entire directory readdir (3).

Can it be assumed that using glob (3) will be significantly faster (require less disk access) than simply using readdir (3) and testing the filenames? And a stronger statement: may I assume that glob (3), in this case that only one file matches and my pattern uses a prefix, should be as fast as fstat (3)?

Research:


Solution

  • On *NIX systems, there is only one kernel interface to access the directory content: readdir(). On kernel level, *NIX systems don't support any kind of glob or pattern or even prefix matches. Only dumb linear listing of the directory content is supported.

    The glob() (or similarly wordexp()) is a library function which is implemented using the readdir() library function. On top of that, it also has to perform the matching using the glob expressions. It can't be faster than hand-coded readdir() loop.

    P.S. Going level lower, on level of the file-systems: the directory entries on disk generally are not sorted. Thus optimization of the partial file name look-up is not possible. (Additionally, most file systems are oblivious to the charset used to encode the file names.)