Search code examples
pythonpython-3.xstringcpythonpython-internals

Python string concatenation internal details


Assume that we have a list of strings and we want to create a string by concatenating all element in this list. Something like this:

def foo(str_lst):
    result = ''
    for element in str_lst:
        result += element
    return result

Since strings are immutable objects, I expect that python creates a new str object and copy contents of result and element at each iteration. It makes O(M * N^2) time complexity, M is the length of each element and N is the size of the list.

However, my experiment shows that it runs in linear time.

N = 1000000 # 1 million
str_lst = ['a' for _ in range(N)]

foo(str_lst) # It takes around 0.5 seconds

N = 2000000 # 2 million
str_lst = ['a' for _ in range(N)]

foo(str_lst) # It takes around 1.0 seconds

N = 10000000 # 10 million
str_lst = ['a' for _ in range(N)]

foo(str_lst) # It takes around 5.3 seconds

I suspect that python uses something like stringbuffer under the hood. So, it doesn't create new object at each iteration.

Now consider a slightly different implementation. The only difference is one extra assignment.

def foo2(str_lst):
    result = ''
    for element in str_lst:
        result += element
        temp = result # new added line
    return result

I know that temp = result line does not create a new object. temp just points to the same object. So, this little change shouldn't affect the performance much.

N = 1000000 # 1 million
str_lst = ['a' for _ in range(N)]
foo(str_lst) # It takes around 0.5 seconds
foo2(str_lst) # It takes around 30 seconds

N = 2000000 # 2 million
str_lst = ['a' for _ in range(N)]
foo(str_lst) # It takes around 1 seconds
foo2(str_lst) # It takes around 129 seconds

However, there is a huge difference. And it looks like foo2 function is O(N^2) while foo is O(N).

My question is how does python achieve linear time in string concatenation without breaking other language components like immutable object assignment? and how does that extra line affect the performance that much? I searched a bit in the cpython implementation but couldn't find exact location.

Update

Here is the line profiling results.

result for foo function

Total time: 0.545577 s
File: <ipython-input-38-b9bb169e8fe0>
Function: foo at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
 1                                           def foo(str_lst):
 2         1          2.0      2.0      0.0      result = ''
 3   1000001     238820.0      0.2     43.8      for element in str_lst:
 4   1000000     306755.0      0.3     56.2          result += element
 5         1          0.0      0.0      0.0      return result

result for foo2 function

Total time: 30.6663 s
File: <ipython-input-40-34dd53670dd9>
Function: foo2 at line 1
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
 1                                           def foo2(str_lst):
 2         1          2.0      2.0      0.0      result = ''
 3   1000001     299122.0      0.3      1.0      for element in str_lst:
 4   1000000   30033127.0     30.0     97.7          result += element
 5   1000000     413283.0      0.4      1.3          temp = result
 6         1          0.0      0.0      0.0      return result

Somehow temp = result line affects the performance of result += element line.


Solution

  • Having another name pointing to the same object kills the optimisation. The optimisation basically works by resizing the string object and appending in place. If you have more than one references to that object, you can't resize without affecting the other reference. With strings being immutable, allowing this would be a serious flaw of the implementation.

    temp = result
    

    increased the reference count for the string object named by result thereby prohibiting the optimisation.

    The full list of checks performed in the case of += (which eventually translates to PyUnicode_Append) can be seen in the unicode_modifiable function. Among other things, it checks that the reference count of the object is equal to one, that it isn't interned and that it isn't a string subclass.

    There's a couple more checks in the if statement guarding this optimisation, if you want a more thorough list.


    Though not the basic issue of your question, future readers might be curious about how to efficiently perform string concatenations. Besides similar questions on S.O, the Python FAQ also has an entry on this.