Search code examples
javastringgarbage-collectionstring-interningstring-pool

Java String Pool with String constructor and the intern function


I learned about the Java String Pool recently, and there's a few things that I don't quiet understand.

When using the assignment operator, a new String will be created in the String Pool if it doesn't exist there already.

String a = "foo"; // Creates a new string in the String Pool
String b = "foo"; // Refers to the already existing string in the String Pool

When using the String constructor, I understand that regardless of the String Pool's state, a new string will be created in the heap, outside of the String Pool.

String c = new String("foo"); // Creates a new string in the heap

I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.

String d = new String("bar"); // Creates a new string in the String Pool and in the heap

I didn't find any further information about this, but I would like to know if that's true.

If that is indeed true, then - why? Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.

Another thing that I would like to know is how the .intern() function of the String class works: Does it just return a pointer to the string in the String Pool?

And finally, in the following code:

String s = new String("Hello");
s = s.intern();

Will the garbage collector delete the string that is outside the String Pool from the heap?


Solution

  • You wrote

    String c = new String("foo"); // Creates a new string in the heap
    

    I read somewhere that even when using the constructor, the String Pool is being used. It will insert the string into the String Pool and into the heap.

    That’s somewhat correct, but you have to read the code correctly. Your code contains two String instances. First, you have the string literal "foo" that evaluates to a String instance, the one that will be inserted into the pool. Then, you are creating a new String instance explicitly, using new String(…) calling the String(String) constructor. Since the explicitly created object can’t have the same identity as an object that existed prior to its creation, two String instances must exist.

    Why does java create this duplicate string? It seems completely redundant to me since the strings in java are immutable.

    Well it does so, because you told it so. In theory, this construction could get optimized, skipping the intermediate step that you can’t perceive anyway. But the first assumption for a program’s behavior should be that it does precisely what you have written.

    You could ask why there’s a constructor that allows such a pointless operation. In fact, this has been asked before and this answer addresses this. In short, it’s mostly a historical design mistake, but this constructor has been used in practice for other technical reasons; some do not apply anymore. Still, it can’t be removed without breaking compatibility.

    String s = new String("Hello");
    s = s.intern();
    

    Will the garbage collector delete the string that is outside the String Pool from the heap?

    Since the intern() call will evaluate to the instance that had been created for "Hello" and is distinct from the instance created via new String(…), the latter will definitely be unreachable after the second assignment to s. Of course, this doesn’t say whether the garbage collector will reclaim the string’s memory only that it is allowed to do so. But keep in mind that the majority of the heap occupation will be the array that holds the character data, which will be shared between the two string instances (unless you use a very outdated JVM). This array will still be in use as long as either of the two strings is in use. Recent JVMs even have the String Deduplication feature that may cause other strings of the same contents in the JVM use this array (to allow collection of their formerly used array). So the lifetime of the array is entirely unpredictable.