Search code examples
javastring-constant

Java String Constant Pool Confused


I'm learning string constant pool now, based on jdk1.8.
I have already learnt the related knowledge, but when I tested these items, the result makes me some confused.
I have read some blogs, but I don't get the detailed reason.

// Test1
String sa = new String("lmn");
String sa1 = new String("opq");
StringBuilder stringBuilder = new StringBuilder().append(sa).append(sa1);
String sa2 = stringBuilder.toString();
String sa3 = sa2.intern();
String sa4 = "lmnopq";
System.out.println("sa2 == sa3 ? " + (sa2 == sa3)); // true  why?
System.out.println("sa3 == sa4 ? " + (sa3 == sa4)); // true

// Test2
String sss = new String("ghi") + new String("hjk");
String sss1 = sss.intern();
System.out.println("sss == sss1 ? " + (sss == sss1)); // true why?

// Test3
String sc = new String("rszt") + new String("zuvw");
String sc1 = "rsztzuvw";
String sc2 = sc.intern();
System.out.println("sc == sc1 ? " + (sc == sc1)); // false why?

I have three questions:

  1. Why does sa2 == sa3 ? When is the lmnopq stored in string constant pool?
  2. I have viewed the bytecode file, and found that new String("ghi") + new String("hjk") is be optimised to use stringBuilder.append(), so the reason of sss == sss1 is the same as sa2 == sa3 ?
  3. Test2 is similar to Test3, the difference is that I add the String sc1 = "rsztzuvw"; before String sc2 = sc.intern();, so why does sc != sc1 now ?

Solution

  • (The important part of this answer is at the bottom!)

    Why does sa2 == sa3 ?

    Let's assume that sa2 is not in the string pool to start with.

    • In Java 7 and later, sa2.intern() will make the sa2 object part of the string pool, and return it as the result. In this case sa2 and sa3 will refer to the same object.

    • Prior to Java 7, sa2.intern() would create a new string in the PermGen heap, make it part of the string pool and return it. In this case sa2 and sa3 will refer to different objects.

    When is the lmnopq stored in string constant pool?

    It depends on two things:

    1. The behavior of intern(); see above.
    2. Whether the string literal in String sa4 = "lmnopq"; is interned when the class is loaded or initialized, or when the string literal is first used. This also depends on the version of Java.

    I have viewed the bytecode file, and found that new String("ghi") + new String("hjk") is be optimized to use stringBuilder.append(), so the reason of sss == sss1 is the same as sa2 == sa3 ?

    No that isn't the reason.

    And incidentally, more recent versions of Javac (after Java 8, I think) don't do that optimization anyway.

    Test2 is similar to Test3, the difference is that I add the String sc1 = "rsztzuvw"; before String sc2 = sc.intern();, so why does sc != sc1 now ?

    This relates to what I mentioned above about when string literals are interned. You are obviously using a JVM where interning occurs on the first use of a string literal. Moving the line where you use that literal means that it is interned before the explicit intern() call.

    (It is complicated. Think it through carefully.)


    As you can see, there is a whole lot of version dependent complexity in determining the actual answers to these questions. It is complexity that you don't need to know about.

    This is useless knowledge. It won't help you as a programmer to know precisely how your version of Java implements the string pool, explicit interning via intern() calls, and implicit interning of string literals.

    All you really need to know is three golden rules.

    1. Always compare strings using the equals method. Using == will often give unexpected results.
    2. Don't call String.intern() explicitly. There is no need. If you really need to save space occupied by duplicate string objects, ensure that the GC string de-duplication feature is enabled.
    3. Don't call new String on a String argument. It is unnecessary and inefficient unless the JIT compiler can optimize away the new.

    If you follow these rules, the tricky details of interning are irrelevant. 'Cos code that follows the rules gives the correct answer irrespective of those details.