Search code examples
pythoncompilationinternalsobject-identity

String concatenation in Python


Can you describe difference between two ways of string concatenation: simple __add__ operator and %s patterns? I had some investigation in this question and found %s (in form without using parentheses) a little faster.

Also another question was appeared: why result of 'hell%s' % 'o' refers to another memory region than 'hell%s' % ('o',)?

There is some code example:

l = ['hello', 'hell' + 'o', 'hell%s' % 'o', 'hell%s' % ('o',)]
print [id(s) for s in l]

Result:

[34375618400, 34375618400, 34375618400, 34375626256]

P.S. I know about string interning :)


Solution

  • Here is a small exercise:

    >>> def f1():
        'hello'
    
    
    >>> def f2():
        'hel' 'lo'
    
    
    >>> def f3():
        'hel' + 'lo'
    
    
    >>> def f4():
        'hel%s' % 'lo'
    
    
    >>> def f5():
        'hel%s' % ('lo',)
    
    
    >>> for f in (f1, f2, f3, f4, f5):
        print(f.__name__)
        dis.dis(f)
    
    
    f1
      1           0 LOAD_CONST               1 (None) 
                  3 RETURN_VALUE         
    f2
      1           0 LOAD_CONST               1 (None) 
                  3 RETURN_VALUE         
    f3
      2           0 LOAD_CONST               3 ('hello') 
                  3 POP_TOP              
                  4 LOAD_CONST               0 (None) 
                  7 RETURN_VALUE         
    f4
      2           0 LOAD_CONST               3 ('hello') 
                  3 POP_TOP              
                  4 LOAD_CONST               0 (None) 
                  7 RETURN_VALUE         
    f5
      2           0 LOAD_CONST               1 ('hel%s') 
                  3 LOAD_CONST               3 (('lo',)) 
                  6 BINARY_MODULO        
                  7 POP_TOP              
                  8 LOAD_CONST               0 (None) 
                 11 RETURN_VALUE         
    

    As you can see, all simple concatenations/formatting are done by compiler. The last function requires more complex formatting and therefore, I guess, is actually executed. Since all those object created at compilation time they all have the same id.