I understand that conditional expressions (or ternary operators) are lazy in Python. They represent conditional execution rather than conditional selection. In other words, only one of a
or b
is evaluated in the following:
c = a if condition else b
What I'm interested to know is how this is implemented internally. Does Python convert to an if
statement as below and, if so, at what stage does this conversion occur?
if condition:
c = a
else:
c = b
Or is the ternary operator actually a distinct and separate expression, defined entirely separately? If so, can I access the CPython code for conditional expressions?
I've looked at the following which explain what the ternary operator does, but none of them make clear how they are implemented:
Edit: You can assume CPython reference implementation.
Python doesn't have to convert anything, and couldn't if it wanted to.
The conditional expression is parsed by using the language grammar into an abstract syntax tree, which in turn is then compiled to bytecode. You can produce the AST by using the ast.parse()
function:
>>> import ast
>>> ast.parse('c = a if condition else b').body[0] # first statement in the tree
<_ast.Assign object at 0x10f05c550>
>>> ast.dump(ast.parse('c = a if condition else b').body[0])
"Assign(targets=[Name(id='c', ctx=Store())], value=IfExp(test=Name(id='condition', ctx=Load()), body=Name(id='a', ctx=Load()), orelse=Name(id='b', ctx=Load())))"
Note the ast.IfExp()
node in the AST produced for the assignment; this is a dedicated node for conditional expressions. It has test
, body
and orelse
parts to represent the 3 expressions that make up the condition, true and false parts. This is documented in the ast
module Abstract Grammar section:
expr = [...] | [...] | IfExp(expr test, expr body, expr orelse)
This shows that the type of each element is another expr
expression node.
The parse tree is then compiled to bytecode that uses the stack to conditionally jump to the right section based on the test; we can pass the AST produced by ast.parse()
directly to the compile()
function, after which the dis
module lets us look at a human-friendly form of the bytecode produced by compilation:
>>> import dis
>>> dis.dis(compile(ast.parse('c = a if condition else b'), '', 'exec'))
1 0 LOAD_NAME 0 (condition)
2 POP_JUMP_IF_FALSE 8
4 LOAD_NAME 1 (a)
6 JUMP_FORWARD 2 (to 10)
>> 8 LOAD_NAME 2 (b)
>> 10 STORE_NAME 3 (c)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
So if the condition is false, the interpreter loop jumps forward to instruction 8, otherwise instructions 4 and 6 are executed, with instruction 6 jumping forward to instruction 10 (so past the else
expression). The end result is that either instruction 4 or instruction 8 puts a new result on the top of the stack for STORE_NAME
to move to a variable.
An if
statement results in a different AST node, and the resulting bytecode happens to be very similar in that it too would use jumps. But the compiler treats them as distinct pieces of syntax, and it has to.
Expressions and statements are two very different fundamental building blocks of a programming language. Statements can contain expressions, but expressions can't contain statements, only other expressions. And expressions can produce a value (for the surrounding syntax to use), but statements can't. So Python has to treat conditional expressions very differently from statements, in that the grammar parser knows when to expect a statement and when an expression is allowed. If you transformed a conditional expression into a statement, you would not be able to ever use such an expression as part of a bigger expression!
Because an if
statement is not an expression, it doesn't return a value (as only expressions can produce a value), and so the resulting bytecode would not produce a value on the top of the stack to be used by the surrounding Python code (there is no c = if condition : ...
). if
statements contain a condition expression, and a suite, which must always consist of more statements (there is such a thing as an 'expression statement' to let you put just an expression in a statement, such as 1 + 1
on a single line), and those statements can 'do stuff' like assignments or return from a function, but nothing they do would ever make if
return something.
This is reflected in the AST node definition for if
statements:
stmt = [...] | [...] | If(expr test, stmt* body, stmt* orelse)
So for an If
node, test
is the only expression node, and body
and orelse
both consist of zero or more statements. The orelse
part would hold any elif ...:
tests as further If()
nodes, or any other type of statement to form an unconditional else:
. With zero-or-more elements, you can't expect a single result.
So this isn't unique to CPython, this applies to all Python implementations. The Python grammar is not an implementation detail.