Search code examples
clinuxbashsortingunix

How does 'ls' sort filenames?


I'm trying to write a function that mimics the output of the ls command in Unix. I was originally trying to perform this using scandir and alphasort, and this did indeed print the files in the directory, and it did sort them, but for some reason, this sorted list does not seem to match the same "sorted list" of filenames that ls gives.

For example, if I have a directory that contains file.c, FILE.c, and ls.c.

ls displays them in the order: file.c FILE.c ls.c. But when I sort it using alphasort/scandir, it sorts them as: FILE.c file.c ls.c

How does ls sort the files in the directory such that it gives such a differently ordered result?


Solution

  • To emulate default ls -1 behaviour, make your program locale-aware by calling

    setlocale(LC_ALL, "");
    

    near the beginning of your main(), and use

    count = scandir(dir, &array, my_filter, alphasort);
    

    where my_filter() is a function that returns 0 for names that begin with a dot ., and 1 for all others. alphasort() is a POSIX function that uses the locale collation order, same order as strcoll().

    The basic implementation is something along the lines of

    #define  _POSIX_C_SOURCE 200809L
    #define  _ATFILE_SOURCE
    #include <stdlib.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <locale.h>
    #include <string.h>
    #include <dirent.h>
    #include <stdio.h>
    #include <errno.h>
    
    static void my_print(const char *name, const struct stat *info)
    {
        /* TODO: Better output; use info too, for 'ls -l' -style output? */
        printf("%s\n", name);
    }
    
    static int my_filter(const struct dirent *ent)
    {
        /* Skip entries that begin with '.' */
        if (ent->d_name[0] == '.')
            return 0;
    
        /* Include all others */
        return 1;
    }
    
    static int my_ls(const char *dir)
    {
        struct dirent **list = NULL;
        struct stat     info;
        DIR            *dirhandle;
        int             size, i, fd;
    
        size = scandir(dir, &list, my_filter, alphasort);
        if (size == -1) {
            const int cause = errno;
    
            /* Is dir not a directory, but a single entry perhaps? */
            if (cause == ENOTDIR && lstat(dir, &info) == 0) {
                my_print(dir, &info);
                return 0;
            }
    
            /* Print out the original error and fail. */
            fprintf(stderr, "%s: %s.\n", dir, strerror(cause));
            return -1;
        }
    
        /* We need the directory handle for fstatat(). */
        dirhandle = opendir(dir);
        if (!dirhandle) {
            /* Print a warning, but continue. */
            fprintf(stderr, "%s: %s\n", dir, strerror(errno));
            fd = AT_FDCWD;
        } else {
            fd = dirfd(dirhandle);
        }
    
        for (i = 0; i < size; i++) {
            struct dirent *ent = list[i];
    
            /* Try to get information on ent. If fails, clear the structure. */
            if (fstatat(fd, ent->d_name, &info, AT_SYMLINK_NOFOLLOW) == -1) {
                /* Print a warning about it. */
                fprintf(stderr, "%s: %s.\n", ent->d_name, strerror(errno));
                memset(&info, 0, sizeof info);
            }
    
            /* Describe 'ent'. */
            my_print(ent->d_name, &info);
        }
    
        /* Release the directory handle. */
        if (dirhandle)
            closedir(dirhandle);
    
        /* Discard list. */
        for (i = 0; i < size; i++)
            free(list[i]);
        free(list);
    
        return 0;
    }
    
    int main(int argc, char *argv[])
    {
        int arg;
    
        setlocale(LC_ALL, "");
    
        if (argc > 1) {
            for (arg = 1; arg < argc; arg++) {
                if (my_ls(argv[arg])) {
                    return EXIT_FAILURE;
                }
            }
        } else {
            if (my_ls(".")) {
                return EXIT_FAILURE;
            }
        }
    
        return EXIT_SUCCESS;
    }
    

    Note that I deliberately made this more complex than strictly needed for your purposes, because I did not want you to just copy and paste the code. It will be easier for you to compile, run, and investigate this program, then port the needed changes -- possibly just the one setlocale("", LC_ALL); line! -- to your own program, than try and explain to your teacher/lecturer/TA why the code looks like it was copied verbatim from somewhere else.

    The above code works even for files specified on the command line (the cause == ENOTDIR part). It also uses a single function, my_print(const char *name, const struct stat *info) to print each directory entry; and to do that, it does call stat for each entry.

    Instead of constructing a path to the directory entry and calling lstat(), my_ls() opens a directory handle, and uses fstatat(descriptor, name, struct stat *, AT_SYMLINK_NOFOLLOW) to gather the information in basically the same manner as lstat() would, but name being a relative path starting at the directory specified by descriptor (dirfd(handle), if handle is an open DIR *).

    It is true that calling one of the stat functions for each directory entry is "slow" (especially if you do /bin/ls -1 style output). However, the output of ls is intended for human consumption; and very often piped through more or less to let the human view it at leisure. This is why I would personally do not think the "extra" stat() call (even when not really needed) is a problem here. Most human users I know of tend to use ls -l or (my favourite) ls -laF --color=auto anyway. (auto meaning ANSI colors are used only if standard output is a terminal; i.e. when isatty(fileno(stdout)) == 1.)

    In other words, now that you have the ls -1 order, I would suggest you modify the output to be similar to ls -l (dash ell, not dash one). You only need to modify my_print() for that.