Search code examples
javascalaperformancejava-streamakka

How to get Streams of File Attributes from the FileSystem?


I am writing a Web Server and am trying to make sure I am as efficient as possible, minimizing File System Calls. The problem is that the methods that return Streams such as java.nio.file.Files.list return a Stream of Paths, and I would like to have a Stream of BasicFileAttributes, so that I can return the creation time and update time for each Path (on say returning results for an LDP Container).

Of course a simple solution would be to map each element of the Stream with a function that takes the path and returns a file attribute (p: Path) => Files.getAttributeView... but that sounds like it would make a call to the FS for each Path, which seems like a waste, because to get the file information the JDK can't have been far from the Attribute info.

I actually came across this mail from 2009 OpenJDK mailing list that states that they had discussed adding an API that would return a pair of a Path and Attributes...

I found a non-public class on the JDK java.nio.file.FileTreeWalker which has an api that would allow one to fetch the attributes FileTreeWalker.Event. That actually makes use of a sun.nio.fs.BasicFileAttributesHolder which allows a Path to keep a cache of the Attributes. But it's not public and it is not clear where it works.

There is of course also the whole FileVisitor API, and that has methods that return both a Path and BasicFileAttributes as shown here:

public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {...}

So I am looking if there is a way to turn that into a Stream which respects the principle of back pressure from the Reactive Manifesto that was pushed by Akka, without it hogging too many resources. I checked the open source Alpakka File project, but that is also streaming the Files methods that return Paths ...


Solution

  • You can access file attributes with their path by using Files.find which accepts a BiPredicate<Path, BasicFileAttributes> and store the value as it tests each path.

    The side effect action inside the BiPredicate will enable operations on both objects without needing to touch the file system per item in the path. With your predicate condition yourPred, side effect predicate below will collect the attributes for you to retrieve inside the stream processing:

    public static void main(String[] args) throws IOException {
        Path dir = Path.of(args[0]);
    
        // Use `ConcurrentHashMap` if using `stream.parallel()`
        HashMap <Path,BasicFileAttributes> attrs = new HashMap<>();
    
        BiPredicate<Path, BasicFileAttributes> yourPred = (p,a) -> true;
    
        BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> {
            return yourPred.test(p, a)
                    // && p.getNameCount() == dir.getNameCount()+1 // Simulates Files.list
                    && attrs.put(p, a) == null;
        };
        try(var stream = Files.find(dir, Integer.MAX_VALUE, predicate)) {
            stream.forEach(p-> System.out.println(p.toString()+" => "+attrs.get(p)));
            // Or: if your put all your handling code in the predicate use stream.count();
        }
    }
    

    To similate the effect of File.list use a one level find scanner:

     BiPredicate<Path, BasicFileAttributes> yourPred = (p,a) -> p.getNameCount() == dir.getNameCount()+1;
    

    For a large folder scan you should clean up the attrs map as you go by inserting attrs.remove(p); after consuming the path.

    Edit

    The answer above can be refactored to a 3 line call returning stream of Map.Entry<Path, BasicFileAttributes>, or it's easy to add a class/record to hold the Path/BasicFileAttribute pair and return Stream<PathInfo> instead:

    /**
     * Call Files.find() returning a stream with both Path+BasicFileAttributes
     * as type Map.Entry<Path, BasicFileAttributes>
     * <p>Could declare a specific record to replace Map.Entry as:
     *    record PathInfo(Path path, BasicFileAttributes attr) { };
     */
    public static Stream<Map.Entry<Path, BasicFileAttributes>>
    find(Path dir, int maxDepth, BiPredicate<Path, BasicFileAttributes> matcher, FileVisitOption... options) throws IOException {
    
        HashMap <Path,BasicFileAttributes> attrs = new HashMap<>();
        BiPredicate<Path, BasicFileAttributes> predicate = (p,a) -> (matcher == null || matcher.test(p, a)) && attrs.put(p, a) == null;
    
        return Files.find(dir, maxDepth, predicate, options).map(p -> Map.entry(p, attrs.remove(p)));
    }