Search code examples
pythonstringstring-interning

Why isn't `str(1) is '1'` `True` in Python?


I'm not asking about the difference between == and is operators! I am asking about interning or something..!

In Python 3.9.1,

>>> str(1) is '1'
<stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
False
>>> '1' is '1'
<stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
True

I found out that characters which match [a-zA-Z0-9_] are interned in Python. I understand why '1' is '1'. Python stores a character '1' somewhere in the memory internally and refers to it whenever '1' is called. And str(1) returns '1', and I think, it should refers to the same address as other literal '1's. Shouldn't str(1) is '1' also be True?


Solution

  • is checks for references, not content. Also, str(1) is not a literal therefore it is not interned.

    But '1' is interned because it's directly a string. Whereas str(1) goes through a process to become a string. As you can see:

    >>> a = '1'
    >>> b = str(1)
    >>> a
    '1'
    >>> b
    '1'
    >>> a is b
    False
    >>> id(a)
    1603954028784
    >>> id(b)
    1604083776304
    >>>
    

    So the way to make them both interned is with sys.intern:

    >>> import sys
    >>> a = '1'
    >>> b = str(1)
    >>> a is b
    False
    >>> a is sys.intern(b)
    True
    >>> 
    

    As mentioned in the docs:

    Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

    Interned strings are not immortal; you must keep a reference to the return value of intern() around to benefit from it.

    Note that in Python 2 intern() was a built-in keyword, but now in python 3 it was merged into the sys module to become sys.intern