Search code examples
javaalgorithmrecursionjava-8nio

Find direct and indirect subclasses by scanning filesystem


I'm having a problem in writing an algorithm to help me scan a file system and find all subclasses of a certain class.

Details:

I've an app that scans an external application using nio Files.walk() while retrieving I check for "extends SuperClass" while reading the file if the word exits, I add the class name in my list as follows:

List<String> subclasses = new ArrayList<>();
Files.walk(appPath)
     .filter(p->Files.isRegularFile(p) && p.toString()
     .endsWith(".java")).forEach(path -> {
        try {
         List<String> lines = Files.readAllLines(path);
         Pattern pattern = Pattern.compile("\\bextends SuperClass\\b");
         Matcher matcher = pattern
                           .matcher(lines.stream()
                                 .collect(Collectors.joining(" ")));
         boolean isChild = matcher.find();
         if(isChild) subclasses.add(path.getFileName().toString());
        }catch (IOException e){
                //handle IOE
        }

The problem with the above is that it only gets direct subclasses of SuperClass but I need to retrieve all direct and indirect subclasses. I thought about recursion since I've no Idea how many subclasses of SuperClass there is but I couldn't implement any reasonable implementation.

NOTES:

  • Scanning more than 600 thousands file
  • I have no Idea how many direct/indirect subclasses of SuperClass there is
  • The application that I'm scanning is external and I can't modify its code so I'm only allowed to access it by reading files and see where extends exists
  • If there is a non-recursive solution to the problem that would be great but if there's no other way, I'll be more than happy to accept a recursive one since I care about the solution more than performance.

Edit:

I use the following regex to compare both name and import to make sure even in case of same name different packages the output is correct:

Pattern pattern = Pattern.compile("("+superClasss.getPackage()+")[\\s\\S]*(\\bextends "+superClass.getName()+"\\b)[\\s\\S]");

I also tried:

Pattern pattern = Pattern.compile("\\bextends "+superClass.getName()+"\\b");

But there is also some missing subclasses, I believe that the code bellow skips some checks, and doesn't fully work:

public static List<SuperClass> getAllSubClasses(Path path, SuperClass parentClass) throws IOException{
classesToDo.add(baseClass);
while(classesToDo.size() > 0) {
    SuperClass superClass = classesToDo.remove(0);
    List<SuperClass> subclasses = getDirectSubClasses(parentPath,parentClass);
    if(subclasses.size() > 0)
        classes.addAll(subclasses);
    classesToDo.addAll(subclasses);
}
return classes;

}

Any help is truly appreciated!

Edit 2 I also noticed another problem, is that when I detect a subclass I get the file name currentPath.getFileName() which might or might not be the subclass name as the subclass may be a nested or non-public class in the same file.


Solution

  • I strongly recommend parsing compiled class files instead of source code. Since these class files are already optimized for being processed by machines, a lot of the complexity and corner cases of the source code file processing has been eliminated.

    So a solution to build a complete class hierarchy tree using the ASM library would look like this:

    public static Map<String, Set<String>> getClassHierarchy(Path root) throws IOException {
        return Files.walk(root)
             .filter(p->Files.isRegularFile(p) && isClass(p.getFileName().toString()))
             .map(p -> getClassAndSuper(p))
             .collect(Collectors.groupingBy(Map.Entry::getValue,
                    Collectors.mapping(Map.Entry::getKey, Collectors.toSet())));
    }
    private static boolean isClass(String fName) {
        // skip package-info and module-info
        return fName.endsWith(".class") && !fName.endsWith("-info.class");
    }
    private static Map.Entry<String,String> getClassAndSuper(Path p) {
        final class CV extends ClassVisitor {
            Map.Entry<String,String> result;
            public CV() {
                super(Opcodes.ASM5);
            }
            @Override
            public void visit(int version, int access,
                    String name, String signature, String superName, String[] interfaces) {
                result = new AbstractMap.SimpleImmutableEntry<>(
                    Type.getObjectType(name).getClassName(),
                    superName!=null? Type.getObjectType(superName).getClassName(): "");
            }
        }
        try {
            final CV visitor = new CV();
            new ClassReader(Files.readAllBytes(p)).accept(visitor, ClassReader.SKIP_CODE);
            return visitor.result;
        } catch (IOException ex) {
            throw new UncheckedIOException(ex);
        }
    }
    

    As a bonus, resp. to create some test cases, the following method adds the ability to build the hierarchy for a runtime class’ source:

    public static Map<String, Set<String>> getClassHierarchy(Class<?> context)
                                            throws IOException, URISyntaxException {
        Path p;
        URI clURI = context.getResource(context.getSimpleName()+".class").toURI();
        if(clURI.getScheme().equals("jrt")) p = Paths.get(URI.create("jrt:/modules"));
        else {
            if(!clURI.getScheme().equals("file")) try {
                FileSystems.getFileSystem(clURI);
            } catch(FileSystemNotFoundException ex) {
                FileSystems.newFileSystem(clURI, Collections.emptyMap());
            }
            String qn = context.getName();
            p = Paths.get(clURI).getParent();
            for(int ix = qn.indexOf('.'); ix>0; ix = qn.indexOf('.', ix+1)) p = p.getParent();
        }
        return getClassHierarchy(p);
    }
    

    Then, you can do

    Map<String, Set<String>> hierarchy = getClassHierarchy(Number.class);
    System.out.println("Direct subclasses of "+Number.class);
    hierarchy.getOrDefault("java.lang.Number", Collections.emptySet())
             .forEach(System.out::println);
    

    and get

    Direct subclasses of class java.lang.Number
    java.lang.Float
    java.math.BigDecimal
    java.util.concurrent.atomic.AtomicLong
    java.lang.Double
    java.lang.Long
    java.util.concurrent.atomic.AtomicInteger
    java.lang.Short
    java.math.BigInteger
    java.lang.Byte
    java.util.concurrent.atomic.Striped64
    java.lang.Integer
    

    or

    Map<String, Set<String>> hierarchy = getClassHierarchy(Number.class);
    System.out.println("All subclasses of "+Number.class);
    printAllClasses(hierarchy, "java.lang.Number", "  ");
    
    private static void printAllClasses(
            Map<String, Set<String>> hierarchy, String parent, String i) {
        hierarchy.getOrDefault(parent, Collections.emptySet())
            .forEach(x -> {
                System.out.println(i+x);
                printAllClasses(hierarchy, x, i+"  ");
        });
    }
    

    to get

    All subclasses of class java.lang.Number
      java.lang.Float
      java.math.BigDecimal
      java.util.concurrent.atomic.AtomicLong
      java.lang.Double
      java.lang.Long
      java.util.concurrent.atomic.AtomicInteger
      java.lang.Short
      java.math.BigInteger
      java.lang.Byte
      java.util.concurrent.atomic.Striped64
        java.util.concurrent.atomic.LongAdder
        java.util.concurrent.atomic.LongAccumulator
        java.util.concurrent.atomic.DoubleAdder
        java.util.concurrent.atomic.DoubleAccumulator
      java.lang.Integer