I'm trying to record grouping in expressions, like:
condition_a = Condition( "x = 5 ")
condition_b = Condition( "y > 6 " )
condition_c = Condition( "z = 7 " )
condition_c & ( condition_a | condition_b )
These are expressions a user can write to configure a SQL query, so the above would ultimately be expressed as x = 5 AND (y > 6 OR z = 7)
.
I tried this:
class Condition( object ) :
def __init__( self, expr ) :
....
def __and__( self , other ) :
print 'calling and'
....
return self
def __or__( self , other ) :
print 'calling or'
....
return self
def __call__( self , other ) :
print 'calling ()'
....
return self
I would expect this code :
condition_a = Condition( "x = 5 ")
condition_b = Condition( "y > 6 " )
condition_c = Condition( "z = 7 " )
condition = condition_c & ( condition_a | condition_b )
expected = "z = 7 AND ( x = 5 OR y > 6 )"
to print the calling ()
but it does not. I just see :
calling or
calling and
It seems I am not using a call, so something like condition_a(1)
, so __call__
is not called.
Is there a way to achieve what I want? You can check the expected which is what I want to have (and and or works, it's just I don't have the parenthesis around part of the expression).
No, the (...)
parentheses in your expression are not a call expression. They are just there to group your expressions.
Python does not preserve grouping parentheses. They only are used by the parser to build an abstract syntax tree, and grouping influences in what order expressions are executed.
The expression
A & B | C
is parsed into a tree of BinOp
objects, like this:
|
/ \
& C
/ \
A B
&
binds more tightly here, so A & B
is executed before the |
operator is executed. That's because the |
operator has a lower precedence than &
.
Parentheses tell the parser to build separate trees for everything between (...)
before putting that into the tree, so the expression
A & (B | C)
becomes
&
/ \
A |
/ \
B C
The parentheses are gone, all that is left is the tree of expressions, and Python knows know in what order you need to execute those expressions; the |
is executed first, on B
and C
, and the result of this expression is then used in the &
operator.
You can look at those AST objects with the ast
module:
>>> import ast
>>> print(ast.dump(ast.parse('A & B | C').body[0].value)))
BinOp(
left=BinOp(
left=Name(id='A', ctx=Load()),
op=BitAnd(),
right=Name(id='B', ctx=Load())
),
op=BitOr(),
right=Name(id='C', ctx=Load()))
)
>>> print(ast.dump(ast.parse('A & (B | C)').body[0].value))
BinOp(
left=Name(id='A', ctx=Load()),
op=BitAnd(),
right=BinOp(
left=Name(id='B', ctx=Load()),
op=BitOr(),
right=Name(id='C', ctx=Load())
)
)
(I unwrapped the outer Module
and Expr
objects to focus on the BinOp
trees, and manually indented the nested objects).
and you can see the order of operations change when you look at the resulting bytecode with the dis
module:
>>> dis.dis(compile('A & B | C', '', 'eval'))
1 0 LOAD_NAME 0 (A)
2 LOAD_NAME 1 (B)
4 BINARY_AND
6 LOAD_NAME 2 (C)
8 BINARY_OR
10 RETURN_VALUE
>>> dis.dis(compile('A & (B | C)', '', 'eval'))
1 0 LOAD_NAME 0 (A)
2 LOAD_NAME 1 (B)
4 LOAD_NAME 2 (C)
6 BINARY_OR
8 BINARY_AND
10 RETURN_VALUE
(LOAD_NAME
puts a value on the stack, and BINARY_OR
and BINARY_AND
take the top 2 values from the stack and push back the result).
All this means that parentheses cannot be retrieved this way. They are gone, they were only there to inform the order of operations.
What you need to do then, is maintain the syntax tree; you can still reconstruct that from the execution. You could just always include parentheses when you then output your result:
class Expr(object):
def __or__(self, other):
return BinOp(self, '|', other)
def __and__(self, other):
return BinOp(self, '&', other)
class BinOp(Expr):
def __init__(self, left, oper, right):
self.left, self.oper, self.right = left, oper, right
def __repr__(self):
return "({} {} {})".format(self.left, self.oper, self.right)
class Condition(Expr):
def __init__(self, expr):
self.expr = expr
def __repr__(self):
return self.expr
When printing a BinOp
object, you'll always get parentheses:
>>> condition_a = Condition( "x = 5 ")
>>> condition_b = Condition( "y > 6 " )
>>> condition_c = Condition( "z = 7 " )
>>> condition_c & ( condition_a | condition_b )
(z = 7 | (x = 5 | y > 6 ))
>>> condition_c & condition_a | condition_b
((z = 7 | x = 5 ) | y > 6 )
This is fine, because the SQL parser treats parentheses the same way; just to group expressions.
The next step is to add precedence information to your binary operators. You can then decide if you have to have parentheses around a sub-expression or not; only when the left sub-expression has a lower precedence, or the right has a lower or equal precedence, would you need them:
# Condition stays the same
class Expr(object):
precedence = float('inf')
def __or__(self, other):
return BinOp(self, '|', other)
def __and__(self, other):
return BinOp(self, '&', other)
class BinOp(Expr):
def __init__(self, left, oper, right):
self.left, self.oper, self.right = left, oper, right
self.precedence = '|&'.index(oper) # 0 or 1
def __repr__(self):
left, right = self.left, self.right
if left.precedence < self.precedence:
left = '({})'.format(left)
if right.precedence <= self.precedence:
right = '({})'.format(right)
return "{} {} {}".format(left, self.oper, right)
Now parentheses only show up when the expression requires them:
>>> condition_c & ( condition_a | condition_b )
z = 7 & (x = 5 | y > 6 )
>>> condition_c & condition_a | condition_b
z = 7 & x = 5 | y > 6
>>> condition_c & (condition_a & condition_b)
z = 7 & (x = 5 & y > 6 )
If you are going to extend this system (adding in more operator types), you'll have to expand on the concept of precedence there; the above example only has 0
and 1
for the binary operators, but you'll have to naturally expand on these values if you have more.
I'd also strongly urge you to look at SQLAlchemy, which already implements all this, and much more.