Search code examples
javaniojava-8java-stream

Files.walk(), calculate total size


I'm trying to calculate the size of the files on my disc. In java-7 this could be done using Files.walkFileTree as shown in my answer here.

However if i wanted to do this using java-8 streams it will work for some folders, but not for all.

public static void main(String[] args) throws IOException {
    long size = Files.walk(Paths.get("c:/")).mapToLong(MyMain::count).sum();
    System.out.println("size=" + size);
}

static long count(Path path) {
    try {
        return Files.size(path);
    } catch (IOException | UncheckedIOException e) {
        return 0;
    }
}

Above code will work well for path a:/files/ but for c:/ it will throw below exception

Exception in thread "main" java.io.UncheckedIOException: java.nio.file.AccessDeniedException: c:\$Recycle.Bin\S-1-5-20
at java.nio.file.FileTreeIterator.fetchNextIfNeeded(Unknown Source)
at java.nio.file.FileTreeIterator.hasNext(Unknown Source)
at java.util.Iterator.forEachRemaining(Unknown Source)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Unknown Source)
at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.util.stream.LongPipeline.reduce(Unknown Source)
at java.util.stream.LongPipeline.sum(Unknown Source)
at MyMain.main(MyMain.java:16)

I understand where it is coming from and how to avoid it using Files.walkFileTree API.

But how can this exception be avoided using Files.walk() API?


Solution

  • No, this exception cannot be avoided.

    The exception itself occurs inside the the lazy fetch of Files.walk(), hence why you are not seeing it early and why there is no way to circumvent it, consider the following code:

    long size = Files.walk(Paths.get("C://"))
            .peek(System.out::println)
            .mapToLong(this::count)
            .sum();
    

    On my system this will print on my computer:

    C:\
    C:\$Recycle.Bin
    Exception in thread "main" java.io.UncheckedIOException: java.nio.file.AccessDeniedException: C:\$Recycle.Bin\S-1-5-18
    

    And as an exception is thrown on the (main) thread on the third file, all further executions on that thread stop.

    I believe this is a design failure, because as it stands now Files.walk is absolutely unusable, because you never can guarantee that there will be no errors when walking over a directory.

    One important point to notice is that the stacktrace includes a sum() and reduce() operation, this is because the path is being lazily loaded, so at the point of reduce(), the bulk of stream machinery gets called (visible in stacktrace), and then it fetches the path, at which point the UnCheckedIOException occurs.

    It could possibly be circumvented if you let every walking operation execute on their own thread. But that is not something you would want to be doing anyway.

    Also, checking if a file is actually accessible is worthless (though useful to some extent), because you can not guarantee that it is readable even 1ms later.

    Future extension

    I believe it can still be fixed, though I do not know how FileVisitOptions exactly work.
    Currently there is a FileVisitOption.FOLLOW_LINKS, if it operates on a per file basis, then I would suspect that a FileVisitOption.IGNORE_ON_IOEXCEPTION could also be added, however we cannot correctly inject that functionality in there.