Search code examples
pythonstringimmutabilitycpythonpython-internals

Python string with space and without space at the end and immutability


I learnt that in some immutable classes, __new__ may return an existing instance - this is what the int, str and tuple types sometimes do for small values.

But why do the following two snippets differ in the behavior?

With a space at the end:

>>> a = 'string '
>>> b = 'string '
>>> a is b
False

Without a space:

>>> c = 'string'
>>> d = 'string'
>>> c is d
True

Why does the space bring the difference?


Solution

  • This is a quirk of how the CPython implementation chooses to cache string literals. String literals with the same contents may refer to the same string object, but they don't have to. 'string' happens to be automatically interned when 'string ' isn't because 'string' contains only characters allowed in a Python identifier. I have no idea why that's the criterion they chose, but it is. The behavior may be different in different Python versions or implementations.

    From the CPython 2.7 source code, stringobject.h, line 28:

    Interning strings (ob_sstate) tries to ensure that only one string object with a given value exists, so equality tests can be one pointer comparison. This is generally restricted to strings that "look like" Python identifiers, although the intern() builtin can be used to force interning of any string.

    You can see the code that does this in Objects/codeobject.c:

    /* Intern selected string constants */
    for (i = PyTuple_Size(consts); --i >= 0; ) {
        PyObject *v = PyTuple_GetItem(consts, i);
        if (!PyString_Check(v))
            continue;
        if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
            continue;
        PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
    }
    

    Also, note that interning is a separate process from the merging of string literals by the Python bytecode compiler. If you let the compiler compile the a and b assignments together, e.g. by placing them in a module or an if True:, you would find that a and b would be the same string.