Search code examples
clinuxforklow-levelsystems-programming

How to list first level directories only in C?


In a terminal I can call ls -d */. Now I want a program to do that for me, like this:

#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>

int main( void )
{
    int status;

    char *args[] = { "/bin/ls", "-l", NULL };

    if ( fork() == 0 )
        execv( args[0], args );
    else
        wait( &status ); 

    return 0;
}

This will ls -l everything. However, when I am trying:

char *args[] = { "/bin/ls", "-d", "*/",  NULL };

I will get a runtime error:

ls: */: No such file or directory


Solution

  • Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. Which varies (run true | xargs --show-limits to find out); on my system, it is about two megabytes. Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once.

    (When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".)

    Fortunately, there are several solutions. One is to use find instead:

    system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");
    

    You can also format the output as you wish, not depending on locale:

    system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");
    

    If you want to sort the output, use \0 as the separator (since filenames are allowed to contain newlines), and -t= for sort to use \0 as the separator, too. tr will convert them to newlines for you:

    system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");
    

    If you want the names in an array, use glob() function instead.

    Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally:

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <ftw.h>
    
    #define NUM_FDS 17
    
    int myfunc(const char *path,
               const struct stat *fileinfo,
               int typeflag,
               struct FTW *ftwinfo)
    {
        const char *file = path + ftwinfo->base;
        const int depth = ftwinfo->level;
    
        /* We are only interested in first-level directories.
           Note that depth==0 is the directory itself specified as a parameter.
        */
        if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
            return 0;
    
        /* Don't list names starting with a . */
        if (file[0] != '.')
            printf("%s/\n", path);
    
        /* Do not recurse. */
        return FTW_SKIP_SUBTREE;
    }
    

    and the nftw() call to use the above is obviously something like

    if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
        /* An error occurred. */
    }
    

    The only "issue" in using nftw() is to choose a good number of file descriptors the function may use (NUM_FDS). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.

    You can find the actual limit using sysconf(_SC_OPEN_MAX), and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.

    The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds.

    If you want to create a test directory with lots of subdirectories, use something like the following Bash:

    mkdir lots-of-subdirs
    cd lots-of-subdirs
    for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done
    

    On my system, running

    ls -d */
    

    in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine.

    You also cannot remove the directories using rmdir directory-*/ for the same reason. Use

    find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir
    

    instead. Or just remove the entire directory and subdirectories,

    cd ..
    rm -rf lots-of-subdirs