Search code examples
pythonintern

Python string intern mechanism


I'm studying the python string intern mechanism. While I was doing some test like this:

# short str
list_short_str = ['0', str(0), chr(48), ''.join(['0']), '0'.join(('','')), '230'[-1:], ''+'0'+'', 'aaa0a'.strip('a')]

print("short str id:")

for item in list_short_str:
    print(id(item))
    
# long str
list_long_str = ['hello', 'hel'+'lo', 'helloasd'[:5], ''.join(['h','e','l','l','o']), '  hello '.strip(' ')]

print("long str id:")

for item in list_long_str:
    print(id(item))

I got output like this:

short str id:
2450307092400
2450856182064
2450307092400
2450307092400
2450298848880
2450307092400
2450307092400
2450307092400
long str id:
2450855173808
2450855173808
2450856182256
2450856182320
2450856182192

I have tried IDLE, PyCharm and Jupyter, and all of these IDE gave me the same output. More precisely, for short string '0', str(0) and '0'.join(('','')) use the different id(s), and the others share the same; for long string 'hello', 'hello' and 'hel'+'lo' share the same id(s), and the others are different. I access to information but haven't found out the answer. Could anyone please tell me why?


Solution

  • In String Intern mechanism in python, when we create two strings with the same value - instead of allocating memory for both of them, only one string is actually committed to memory. The other one just points to that same memory location. However The string interning behavior in Python can differ based on several factors, including the version of Python, the implementation of the interpreter, and the context in which the string is created.As a result, identical string values may not always be interned, and the behavior can be difficult to predict in certain cases.

    One reason why the string interning result can differ for the same string in Python is that Python interns only string literals and not string objects that are created at runtime. This means if a string is executed at compile time are same then the reference is same. But if the execution done at run time then the reference(id) is differ.

    # In short str:

    The str() function in Python is executed at runtime. It converts the given object into a string representation. This conversion happens dynamically when the str() function is called during the program's execution, rather than during the program's compilation.

    The join() function cannot be executed at compile time because the values of the list and delimiter are only known at runtime.

    The '0'.join(('', '')) expression is executed at runtime in Python. This expression uses the join() method of the string '0' to join the elements of the tuple ('', ''). Since the tuple object ('', '') must exist at runtime for the join() method to operate on it, the expression must be executed at runtime. When the delimiter argument of the join() method is a non-empty string, the join() method cannot be executed at compile time in Python. Therefore, the expression ''.join(['0']) cannot be fully executed at compile time. However, since the join() method is operating on a single-element list and the delimiter is an empty string, the Python interpreter may be able to optimize the expression and partially execute it at compile time.

    In #long_str:

    String concatenation using the + operator in Python is generally executed at runtime and not during compile time. This is because the values of the string operands may not be known until runtime, and therefore the concatenation operation must be performed at that time. In certain cases, the Python interpreter may be able to optimize string concatenation operations and perform them partially or fully at compile time. For example, if both operands of the + operator are string literals, and the expression is used in a context where the result can be computed at compile time, the Python interpreter may optimize the expression and perform the concatenation at compile time.

    For example:

    a = "hello"

    b = "world"

    c = a + b # this concatenation will be executed at runtime

    d = "hello" + "world" # this concatenation may be executed at compile time

    Remaining 'helloasd'[:5], ''.join(['h','e','l','l','o']), ' hello '.strip(' ') will be executed during run time thats why all of them have different ids.

    I hope this might help you atleast a little.