Search code examples
pythonperformancepython-internalspython-3.6f-string

Why were literal formatted strings (f-strings) so slow in Python 3.6 alpha? (now fixed in 3.6 stable)


I've downloaded a Python 3.6 alpha build from the Python Github repository, and one of my favourite new features is literal string formatting. It can be used like so:

>>> x = 2
>>> f"x is {x}"
"x is 2"

This appears to do the same thing as using the format function on a str instance. However, one thing that I've noticed is that this literal string formatting is actually very slow compared to just calling format. Here's what timeit says about each method:

>>> x = 2
>>> timeit.timeit(lambda: f"X is {x}")
0.8658502227130764
>>> timeit.timeit(lambda: "X is {}".format(x))
0.5500578542015617

If I use a string as timeit's argument, my results are still showing the pattern:

>>> timeit.timeit('x = 2; f"X is {x}"')
0.5786435347381484
>>> timeit.timeit('x = 2; "X is {}".format(x)')
0.4145195760771685

As you can see, using format takes almost half the time. I would expect the literal method to be faster because less syntax is involved. What is going on behind the scenes which causes the literal method to be so much slower?


Solution

  • Note: This answer was written for the Python 3.6 alpha releases. A new opcode added to 3.6.0b1 improved f-string performance significantly.


    The f"..." syntax is effectively converted to a str.join() operation on the literal string parts around the {...} expressions, and the results of the expressions themselves passed through the object.__format__() method (passing any :.. format specification in). You can see this when disassembling:

    >>> import dis
    >>> dis.dis(compile('f"X is {x}"', '', 'exec'))
      1           0 LOAD_CONST               0 ('')
                  3 LOAD_ATTR                0 (join)
                  6 LOAD_CONST               1 ('X is ')
                  9 LOAD_NAME                1 (x)
                 12 FORMAT_VALUE             0
                 15 BUILD_LIST               2
                 18 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 21 POP_TOP
                 22 LOAD_CONST               2 (None)
                 25 RETURN_VALUE
    >>> dis.dis(compile('"X is {}".format(x)', '', 'exec'))
      1           0 LOAD_CONST               0 ('X is {}')
                  3 LOAD_ATTR                0 (format)
                  6 LOAD_NAME                1 (x)
                  9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 12 POP_TOP
                 13 LOAD_CONST               1 (None)
                 16 RETURN_VALUE
    

    Note the BUILD_LIST and LOAD_ATTR .. (join) op-codes in that result. The new FORMAT_VALUE takes the top of the stack plus a format value (parsed out at compile time) to combine these in a object.__format__() call.

    So your example, f"X is {x}", is translated to:

    ''.join(["X is ", x.__format__('')])
    

    Note that this requires Python to create a list object, and call the str.join() method.

    The str.format() call is also a method call, and after parsing there is still a call to x.__format__('') involved, but crucially, there is no list creation involved here. It is this difference that makes the str.format() method faster.

    Note that Python 3.6 has only been released as an alpha build; this implementation can still easily change. See PEP 494 – Python 3.6 Release Schedule for the time table, as well as Python issue #27078 (opened in response to this question) for a discussion on how to further improve the performance of formatted string literals.