Search code examples
javathreadpoolforkjoinpoolwatchservicerecursivetask

WatchService large number of directory (recursive)


I want to detect changes inside a directory, so I implement by using WatchService

public class DirWatcher implements Runnable {

    private Path path;
    private ExecutorService exe;

    public DirWatcher(Path path, ExecutorService exe) {
        this.path = path;
        this.exe = exe;
    }

    public void start() throws Exception {
        WatchService watchService = FileSystems.getDefault().newWatchService();
        path.register(watchService, StandardWatchEventKinds.ENTRY_CREATE, StandardWatchEventKinds.ENTRY_DELETE, StandardWatchEventKinds.ENTRY_MODIFY);
        WatchKey key;
        while ((key = watchService.take()) != null) {
            for (WatchEvent<?> event : key.pollEvents()) {
                if(isFileEvent(event)) {
                    // do stuff with file
                } else if(isNewDirCreated(event)) {
                    Path dir = getPath(event, path);
                    DirWatcher newWatcher = new DirWatcher(dir, exe);
                    exe.execute(newWatcher);
                }
            }
            key.reset();
        }

        watchService.close();
    }

    public void run() {
        try {
            start();
        } catch(Exception e) {
        }
    }

    //Other methods
}

Here is main method

public class DirectoryWatcherExample {

    public static void main(String[] args) throws Exception {
        Path root = getRootPath();
        ExecutorService exe = Executors.newFixedThreadPool(//HOW BIG THE POOL SHOULD I INIT? THE NUMBER OF DIRECTORIES IS LARGE (> 50000))
        DirWatcher watcher = new DirWatcher(root, exe);
        exe.execute(watcher);
        List<Path> paths = listRecrursive(root);
        paths.stream().map(p -> new DirWatcher(p, exe)).forEach(exe::execute);
    }
}

My question is: How should I initialize the thread pool? Since the number of task is huge (> 50000). Does it impact to the server (64 Gb of RAM)?

Is ForkJoinPool and RecursiveTask useful in this case? If yes could you provide pseudo code? If no, is there any optimized solution? Thank you


Solution

  • You only need a new WatchService per filesystem, not per directory, and only one polling loop / thread to handle each WatchService.

    As you have it now you have setup a WatchService and polling loop per folder which would be very difficult to scale for 50,000 folders (- without LOOM virtual threads). A fixed size thread pool is unsuitable.

    Instead keep track of Filesystem to WatchService mapping and register new folders to its corresponding filesystem's single WatchService.

    Start a new service / polling thread for every new WatchService which will handle however many folders you register from the same filesystem.

    For many applications a single pairing of WatchService+polling thread can handle all folders of same filesystem. The poll events from each WatchService will be for changes to many folders - and the events tell you which folder.

    For simplicity you might consider extra WatchService+thread pairs to dedicate to particular subtrees - but never as much as one WatchService+thread per folder as this means 1,000s of threads.

    Note that however many WatchService and threads you decide to setup, the event polling loop is very verbose so you should always collate actions to perform later. Treat it like a JavaFx/Swing/AWT UI event handler - record the task to deal with for action outside the polling loop. See this example of collating watch events.