Search code examples
javafile-extensiondirectory-structure

How can I find all unique file extensions fin a folder hierarchy in java


What's the most effective way of walking a folder hierarchy and obtaining a list of unqiue extensions?

This is very similar to this question, except that I'd like to do it from within Java.

There's an obvious recursive solution of checking File.isDirectory(), iterate over all children, checking extension and isDirectory on each and then keeping a unique collection (such as a Set), but I'm trying to see if there's something a bit more efficient.


Solution

  • A custom FileFilter:

    public class FileExtensionFilter implements FilenameFilter {
        private Set<String> filteredExtensions;
        public FileExtensionFilter() {
            filteredExtensions = new HashSet<String>();
        }
        @Override
        public boolean accept(File dir, String name) {
            boolean accept = true;
            for (String filteredExtension:filteredExtensions) {
                accept = accept && !name.endsWith(filteredExtension);
            }
            return accept;
        }
        public void addFilteredExtension(String extension) {
            filteredExtensions.add(extension);
        }
    }
    

    Recursive method solution:

    public Set<String> checkForExtensions(File file) {
        Set<String> extensions = new HashSet<String>();
        if (file.isDirectory()) {
            for (File f : file.listFiles(fileExtensionFilter)) {
                extensions.addAll(checkForExtensions(f));
            }
        } else {
            //NOTE: if you don't want the '.' in the extension you'll need to add a '+1' to the substring call
            String extension = file.getName().substring(Math.max(file.getName().lastIndexOf('.'),0));
            extensions.add(extension);
            fileExtensionFilter.addFilteredExtension(extension);
        }
        return extensions;
    }
    

    Originally I had the same solution without the FileExtensionFilter but noticed I could improve the efficiency a bit by dynamically adding to the filtering. The savings was exponential. I went from 47 seconds down to 700 milliseconds.

    You could also clean up memory usage a bit more now by eliminating the Set all together since the FileExtensionFilter will contain a duplicate copy of all the extensions in the Set.