Search code examples
jvmcompile-time-constantstring-interning

Determine whether a String is a compile-time constant


Given a reference to any String, is it possible to programmatically determine whether this is a reference to a compile time constant?
Or if it's not, then whether it's stored in the intern pool without doing s.intern() == s?

isConst("foo")                       -> true
isConst("foo" + "bar")               -> true   // 2 literals, 1 compile time string
isConst(SomeClass.SOME_CONST_STRING) -> true
isConst(readFromFile())              -> false
isConst(readFromFile().intern())     -> false  // true would be acceptable too

(context for comments below: the question originally asked about literals)


Solution

  • To clarify the original question, every string literal is a compile-time constant, but not every compile-time constant has to originate from a string literal.

    At runtime, there is no difference between a String object that has been constructed for a compile-time constant or constructed by other means. Strings constructed for compile-time constants are automatically added to a pool, but other strings may be added to the same pool manually via intern(). Since strings are constructed and added lazily, it is even possible to construct and add a string manually, so that compile-time constants with the same value get resolved to that string later-on. This answer exploits this possibility, to detect when the String instance for a compile-time constant is actually resolved.

    It’s possible to derive from that answer a method to simply detect whether a string is in the pool or not:

    public static boolean isInPool(String s) {
        return s == new String(s.toCharArray()).intern();
    }
    

    new String(s.toCharArray()) constructs a string with the same contents, which is not in the pool and calling intern() on it must resolve to the same reference as s if s refers to an instance in the pool. Otherwise, intern() may resolve to another existing object or add our string or a newly constructed string and return a reference to it, depending on the implementation, but in either case, the returned reference will be different to s.

    Note that this method has the side effect of adding a string to the pool if it wasn’t there before, which will stay there at least to the next garbage collection cycle, perhaps up to the next full gc, depending on the implementation.

    The test method might be nice for debugging or satisfying curiosity, but there is no point in ever using it in production code. Application code should not depend on that property and the use case proposed in a comment, enforcing pooled strings in performance critical code, is not a good idea.

    Besides the point that the test itself is expensive and counteracting the purpose of performance improvement, the underlying assumption that pooled strings are better than non-pooled is flawed. Not being in the pool doesn’t imply that the application will perform an expensive reconstruction every time it invokes the performance critical code. It may simply hold a reference in a variable or use a HashMap, both approaches way more efficient than calling intern(). In fact, even temporary strings can be the most efficient solution in some cases.