Search code examples
javagradlegroovyjvm

Why Does Groovy eachDir() Give Me The Same Sort Every Time?


I am creating a file containing a list of sub-directories

task createNotes {
  doLast {
    def myFile = new File("my-notes.txt")
    def file = new File("src/test/")
    println myFile.exists()
    myFile.delete()
    println myFile.exists()
    println myFile.absolutePath
    println file.absolutePath
    myFile.withWriter {
      out ->
        file.eachDir { dir ->
          out.println dir.getName()
        }
    }
  }
}

Apparently the sort order can't be guaranteed, but every time I run it I get the same sort order:

soft
java
calc
conc
caab
pres

If I change the "soft" dir to "sofp" the output is:

java
sofp
calc
conc
caab
pres

When I change the name back, it goes to the original order.

It doesn't appear to be sorted in any particular order - which matches the docs saying the order can't be guaranteed, but if so, why is it always giving me the same sort every time?


Solution

  • Let's break it down and have a look at eachDir Groovy extension method implementation first:

    public static void eachDir(File self, @ClosureParams(value = SimpleType.class, options = "java.io.File") Closure closure) throws FileNotFoundException, IllegalArgumentException {
        eachFile(self, FileType.DIRECTORIES, closure);
    }
    

    What does eachFile do?

    public static void eachFile(final File self, final FileType fileType, @ClosureParams(value = SimpleType.class, options = "java.io.File") final Closure closure)
            throws FileNotFoundException, IllegalArgumentException {
        checkDir(self);
        final File[] files = self.listFiles();
        // null check because of https://bugs.java.com/bugdatabase/view_bug?bug_id=4803836
        if (files == null) return;
        for (File file : files) {
            if (fileType == FileType.ANY ||
                    (fileType != FileType.FILES && file.isDirectory()) ||
                    (fileType != FileType.DIRECTORIES && file.isFile())) {
                closure.call(file);
            }
        }
    }
    

    OK, so Groovy just calls Java's File#listFiles method under the covers and then iterates the result without interfering with the existing order.

    Moving to the OpenJDK implementation we can see that Files#listFiles uses FileSystem#list via the normalizedList method.

    FileSystem#list is abstract. Continuing to the two most popular implementations it turns out that both UnixFileSystem#list and Win32FileSystem#list have a native implementation:

    @Override
    public native String[] list(File f);
    

    Windows

    Digging into the Windows implementation:

    JNIEXPORT jobjectArray JNICALL
    Java_java_io_WinNTFileSystem_list(JNIEnv *env, jobject this, jobject file)
    {
        WCHAR *search_path;
        HANDLE handle;
        WIN32_FIND_DATAW find_data;
        int len, maxlen;
        jobjectArray rv, old;
        DWORD fattr;
        jstring name;
        jclass str_class;
        WCHAR *pathbuf;
        DWORD err;
    
        str_class = JNU_ClassString(env);
        CHECK_NULL_RETURN(str_class, NULL);
    
        pathbuf = fileToNTPath(env, file, ids.path);
        if (pathbuf == NULL)
            return NULL;
        search_path = (WCHAR*)malloc(2*wcslen(pathbuf) + 6);
        if (search_path == 0) {
            free (pathbuf);
            errno = ENOMEM;
            JNU_ThrowOutOfMemoryError(env, "native memory allocation failed");
            return NULL;
        }
        wcscpy(search_path, pathbuf);
        free(pathbuf);
        fattr = GetFileAttributesW(search_path);
        if (fattr == INVALID_FILE_ATTRIBUTES) {
            free(search_path);
            return NULL;
        } else if ((fattr & FILE_ATTRIBUTE_DIRECTORY) == 0) {
            free(search_path);
            return NULL;
        }
    
        /* Remove trailing space chars from directory name */
        len = (int)wcslen(search_path);
        while (search_path[len-1] == L' ') {
            len--;
        }
        search_path[len] = 0;
    
        /* Append "*", or possibly "\\*", to path */
        if ((search_path[0] == L'\\' && search_path[1] == L'\0') ||
            (search_path[1] == L':'
            && (search_path[2] == L'\0'
            || (search_path[2] == L'\\' && search_path[3] == L'\0')))) {
            /* No '\\' needed for cases like "\" or "Z:" or "Z:\" */
            wcscat(search_path, L"*");
        } else {
            wcscat(search_path, L"\\*");
        }
    
        /* Open handle to the first file */
        handle = FindFirstFileW(search_path, &find_data);
        free(search_path);
        if (handle == INVALID_HANDLE_VALUE) {
            if (GetLastError() != ERROR_FILE_NOT_FOUND) {
                // error
                return NULL;
            } else {
                // No files found - return an empty array
                rv = (*env)->NewObjectArray(env, 0, str_class, NULL);
                return rv;
            }
        }
    
        /* Allocate an initial String array */
        len = 0;
        maxlen = 16;
        rv = (*env)->NewObjectArray(env, maxlen, str_class, NULL);
        if (rv == NULL) { // Couldn't allocate an array
            FindClose(handle);
            return NULL;
        }
        /* Scan the directory */
        do {
            if (!wcscmp(find_data.cFileName, L".")
                                    || !wcscmp(find_data.cFileName, L".."))
               continue;
            name = (*env)->NewString(env, find_data.cFileName,
                                     (jsize)wcslen(find_data.cFileName));
            if (name == NULL) {
                FindClose(handle);
                return NULL; // error
            }
            if (len == maxlen) {
                old = rv;
                rv = (*env)->NewObjectArray(env, maxlen <<= 1, str_class, NULL);
                if (rv == NULL || JNU_CopyObjectArray(env, rv, old, len) < 0) {
                    FindClose(handle);
                    return NULL; // error
                }
                (*env)->DeleteLocalRef(env, old);
            }
            (*env)->SetObjectArrayElement(env, rv, len++, name);
            (*env)->DeleteLocalRef(env, name);
    
        } while (FindNextFileW(handle, &find_data));
    
        err = GetLastError();
        FindClose(handle);
        if (err != ERROR_NO_MORE_FILES) {
            return NULL; // error
        }
    
        if (len < maxlen) {
            /* Copy the final results into an appropriately-sized array */
            old = rv;
            rv = (*env)->NewObjectArray(env, len, str_class, NULL);
            if (rv == NULL)
                return NULL; /* error */
            if (JNU_CopyObjectArray(env, rv, old, len) < 0)
                return NULL; /* error */
        }
        return rv;
    }
    

    We can see a combination of FindFirstFileW, FindNextFileW and FindClose WinAPI functions used to iterate over the files. Excerpt about ordering from the documentation of FindNextFileW:

    The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system.

    (...)

    The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed.

    So the implementation lists the files in a way that tries to be most optimal given the OS and file-system type constraints. No guarantees about a particular order.

    *nix

    What about *nix systems? Here's the code:

    JNIEXPORT jobjectArray JNICALL
    Java_java_io_UnixFileSystem_list(JNIEnv *env, jobject this,
                                     jobject file)
    {
        DIR *dir = NULL;
        struct dirent *ptr;
        int len, maxlen;
        jobjectArray rv, old;
        jclass str_class;
    
        str_class = JNU_ClassString(env);
        CHECK_NULL_RETURN(str_class, NULL);
    
        WITH_FIELD_PLATFORM_STRING(env, file, ids.path, path) {
            dir = opendir(path);
        } END_PLATFORM_STRING(env, path);
        if (dir == NULL) return NULL;
    
        /* Allocate an initial String array */
        len = 0;
        maxlen = 16;
        rv = (*env)->NewObjectArray(env, maxlen, str_class, NULL);
        if (rv == NULL) goto error;
    
        /* Scan the directory */
        while ((ptr = readdir(dir)) != NULL) {
            jstring name;
            if (!strcmp(ptr->d_name, ".") || !strcmp(ptr->d_name, ".."))
                continue;
            if (len == maxlen) {
                old = rv;
                rv = (*env)->NewObjectArray(env, maxlen <<= 1, str_class, NULL);
                if (rv == NULL) goto error;
                if (JNU_CopyObjectArray(env, rv, old, len) < 0) goto error;
                (*env)->DeleteLocalRef(env, old);
            }
    #ifdef MACOSX
            name = newStringPlatform(env, ptr->d_name);
    #else
            name = JNU_NewStringPlatform(env, ptr->d_name);
    #endif
            if (name == NULL) goto error;
            (*env)->SetObjectArrayElement(env, rv, len++, name);
            (*env)->DeleteLocalRef(env, name);
        }
        closedir(dir);
    
        /* Copy the final results into an appropriately-sized array */
        if (len < maxlen) {
            old = rv;
            rv = (*env)->NewObjectArray(env, len, str_class, NULL);
            if (rv == NULL) {
                return NULL;
            }
            if (JNU_CopyObjectArray(env, rv, old, len) < 0) {
                return NULL;
            }
        }
        return rv;
    
     error:
        closedir(dir);
        return NULL;
    }
    

    This time iteration is supported by the opendir/readdir/closedir trio. The POSIX documentation of readdir only mentions this about ordering:

    The type DIR, which is defined in the header <dirent.h>, represents a directory stream, which is an ordered sequence of all the directory entries in a particular directory.

    Linux documentation has a bit more to say:

    The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion.

    Close enough to Windows there is no order-guarantees apart from that there is some order.

    Conclusion

    'Is not guaranteed' means that a particular feature is an implementation detail and that you shouldn't rely on it. Contrary to 'guaranteed' features, which due to backward compatibility promise to stay invariant for some (usually longer) period of time. Behavior different than documented in case of these features is considered a bug and you can expect it to be fixed after you file a bug report. Changes to such aspects are called breaking changes and are usually well documented in release notes and migration guides (see Vue 3 migration guide for example). They also get announced long before they become finalized and released - see deprecation (often leaving some time to the developers to adjust their code to the new even after they go live).

    The behaviour of 'unguaranteed' features on the other hand may differ between given product/library versions, environments (e.g. JVM implementations, operating systems, product versions) or even particular invocations. They sometimes show some predictable characteristics, but they should not be relied upon. You need to make sure to take care of the guarantees you expect for your piece of code to to have and implement them on your own.

    Wrapping-up your issue: if you expect any specific order of files just sort them first - even if it means that the order will stay the same.