Search code examples
javabytecodeaccess-modifiersunreachable-code

Reduce visibility of classes and methods


TL;DR: Given bytecode, how can I find out what classes and what methods get used in a given method?


In my code, I'd like to programmatically find all classes and methods having too generous access qualifiers. This should be done based on an analysis of inheritance, static usage and also hints I provide (e.g., using some home-brew annotation like @KeepPublic). As a special case, unused classes and methods will get found.

I just did something similar though much simpler, namely adding the final keyword to all classes where it makes sense (i.e., it's allowed and the class won't get proxied by e.g., Hibernate). I did it in the form of a test, which knows about classes to be ignored (e.g., entities) and complains about all needlessly non-final classes.

For all classes of mine, I want to find all methods and classes it uses. Concerning classes, there's this answer using ASM's Remapper. Concerning methods, I've found an answer proposing instrumentation, which isn't what I want just now. I'm also not looking for a tool like ucdetector which works with Eclipse AST. How can I inspect method bodies based on bytecode? I'd like to do it myself so I can programmatically eliminate unwanted warnings (which are plentiful with ucdetector when using Lombok).


Solution

  • Looking at the usage on a per-method basis, i.e. by analyzing all instructions, has some pitfalls. Besides method invocations, there might be method references, which will be encoded using an invokedynamic instruction, having a handle to the target method in its bsm arguments. If the byte code hasn’t been generated from ordinary Java code (or stems from a future version), you have to be prepared to possibly encounter ldc instructions pointing to a handle which would yield a MethodHandle at runtime.

    Since you already mentioned “analysis of inheritance”, I just want to point out the corner cases, i.e. for

    package foo;
    
    class A {
        public void method() {}
    }
    class B implements bar.If {
    }
    
    package bar;
    
    public interface If {
        void method();
    }
    

    it’s easy to overlook that A.method() has to stay public.

    If you stay conservative, i.e. when you can’t find out whether B instances will ever end up as targets of the If.method() invocations at other places in your application, you have to assume that it is possible, you won’t find much to optimize. I think that you need at least inlining of bridge methods and the synthetic inner/outer class accessors to identify unused members across inheritance relationships.

    When it comes class references, there are indeed even more possibilities, to make a per-instruction analysis error prone. They may not only occur as owner of member access instructions, but also for new, checkcast, instanceof and array specific instructions, annotations, exception handlers and, even worse, within signatures which may occur at member references, annotations, local variable debugging hints, etc. The ldc instruction may refer to classes, producing a Class instance, which is actually used in ordinary Java code, e.g. for class literals, but as said, there’s also the theoretical possibility to produce MethodHandles which may refer to an owner class, but also have a signature bearing parameter types and a return type, or to produce a MethodType representing a signature.

    You are better off analyzing the constant pool, however, that’s not offered by ASM. To be precise, a ClassReader has methods to access the pool, but they are actually not intended to be used by client code (as their documentation states). Even there, you have to be aware of pitfalls. Basically, the contents of a CONSTANT_Utf8_info bears a class or signature reference if a CONSTANT_Class_info resp. the descriptor index of a CONSTANT_NameAndType_info or a CONSTANT_MethodType_info points to it. However, declared members of a class have direct references to CONSTANT_Utf8_info pool entries to describe their signatures, see Methods and Fields. Likewise, annotations don’t follow the pattern and have direct references to CONSTANT_Utf8_info entries of the pool assigning a type or signature semantic to it, see enum_const_value and class_info_index