Search code examples
javabytecodebcel

missing classes in classfiles constant pool


i am using bytecode analysis to get all imported classes of a classfile (with BCEL). Now, when i read the constant pool, not all imported classes are mentioned as CONSTANT_Class (see spec) but only as CONSTANT_Utf8. My question now: Am i not able to rely solely on the CONSTANT_Class-entries in the constant pool to read the imported files? do i really have to look at every entry and guess, if its a class name? This also does not seem to be correct in every situation imo. Or do i have to read through the whole bytecode? regards


Solution

  • No, it is not correct to use CONSTANT_Class_info entries alone to discover dependencies on other classes/interfaces. If you're parsing input files you trust or can tolerate incorrect information, you can get away with parsing the constant pool only except for one corner case. To get precise information on arbitrary input you need to parse the whole class file. (I assume by "dependencies" you mean those classes or interfaces without which loading or linking a class may result in exceptions, as described in JVMS chapter 5. This doesn't include classes obtained via Class.forName or other reflective means.)

    Consider the following class.

    public class Main {
        public static void main(String[] args) {
            identity(null);
        }
        public static Object identity(Foo x) {
            return x;
        }
    }
    

    javap -p -v Main.class prints:

    Classfile /C:/Users/jbosboom/Documents/stackoverflow/build/classes/Main.class
      Last modified Jul 2, 2014; size 346 bytes
      MD5 checksum 2237cda2a15a58382b0fb98d6afacc7e
      Compiled from "Main.java"
    public class Main
      SourceFile: "Main.java"
      minor version: 0
      major version: 52
      flags: ACC_PUBLIC, ACC_SUPER
    Constant pool:
       #1 = Methodref          #3.#17         //  java/lang/Object."<init>":()V
       #2 = Class              #18            //  Main
       #3 = Class              #19            //  java/lang/Object
       #4 = Utf8               <init>
       #5 = Utf8               ()V
       #6 = Utf8               Code
       #7 = Utf8               LineNumberTable
       #8 = Utf8               LocalVariableTable
       #9 = Utf8               this
      #10 = Utf8               LMain;
      #11 = Utf8               identity
      #12 = Utf8               (LFoo;)Ljava/lang/Object;
      #13 = Utf8               x
      #14 = Utf8               LAAA;
      #15 = Utf8               SourceFile
      #16 = Utf8               Main.java
      #17 = NameAndType        #4:#5          //  "<init>":()V
      #18 = Utf8               Main
      #19 = Utf8               java/lang/Object
      #20 = Utf8               java/lang/Thread
      #21 = Class              #20            //  java/lang/Thread
      #21 = Utf8               (LBar;)LFakename;
    {
      public Main();
        descriptor: ()V
        flags: ACC_PUBLIC
        Code:
          stack=1, locals=1, args_size=1
             0: aload_0
             1: invokespecial #1                  // Method java/lang/Object."<init>":()V
             4: return
          LineNumberTable:
            line 6: 0
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0       5     0  this   LMain;
    
      public static java.lang.Object identity(Foo);
        descriptor: (LFoo;)Ljava/lang/Object;
        flags: ACC_PUBLIC, ACC_STATIC
        Code:
          stack=1, locals=1, args_size=1
             0: aload_0
             1: areturn
          LineNumberTable:
            line 11: 0
          LocalVariableTable:
            Start  Length  Slot  Name   Signature
                0       2     0     x   LAAA;
    }
    

    The class Foo, referenced as a parameter to the method identity, does not appear in the constant pool as a CONSTANT_Class_info entry. It does appear in the method descriptor for identity (entry #12). Field descriptors may also reference classes not appearing as CONSTANT_Class_info entries. Thus to find all the dependencies from the constant pool alone, you need to look at all UTF8 entries.

    The corner case: Some UTF8 entries may exist to be referenced by CONSTANT_String_info entries. Duplicate UTF8 entries will be merged, so one UTF8 entry might be a method descriptor, a string literal, or both. If you're only parsing the constant pool, you must live with this ambiguity (probably by overapproximating and treating it as a dependency).

    If you trust the input to have been produced by a well-behaved Java compiler under your control, you can parse all UTF8 entries, mindful of the string corner case, and stop reading here. If you need to defend against an attacker feeding your tool handcrafted class files (e.g., you're writing a decompiler and the attacker wants to prevent decompilation), you need to parse the entire class file. Here's a few examples of the potential problems.

    • Entry #20 names a class not used by Main. The JVM may or may not try to resolve this reference (JVMS 5.4 permits both lazy and eager loading). As the class exists, either way, no error will be raised, so this extra entry is harmless, but it will fool tools looking at the constant pool into thinking Thread is a dependency.
    • Entry #21 is an unused method descriptor referring to two fictitious classes. As this descriptor is not used, no error will be raised, but again, tools trusting the constant pool will parse it.
    • Entry #14 is a field descriptor referring to a fictitious class. This entry is actually used by the LineNumberTable attribute, but this debugging information is not checked by the JVM, so the reference is harmless but may fool tools.
    • I don't have an example for this one, but the InnerClasses attribute refers to CONSTANT_Class_info entries, and is not checked for consistency with other class files (per JVMS 4.7.6, albeit in a non-normative note). These references won't prevent loading or linking, but would confuse a tool examining the constant pool.

    That's just what I came up with off the top of my head. A clever attacker going through the JVMS with a fine-tooth comb could probably find more places to add entries to the constant pool that look used but aren't. If you need precise information even in the face of an attacker, you need to parse the whole class file and understand how a JVM will use it.