When running the following line:
>>> [0xfor x in (1, 2, 3)]
I expected Python to return an error.
Instead, the REPL returns:
[15]
What can possibly be the reason?
Python reads the expression as [0xf or (x in (1, 2, 3))]
, because:
It never raises NameError
thanks to short-circuit evaluation - if the expression left to the or
operator is a truthy value, Python will never try to evaluate the right side of it.
First, we have to understand how Python reads hexadecimal numbers.
On tokenizer.c's huge tok_get
function, we:
0x
.The parsed token, 0xf
(as "o" is not in the range of 0-f), will eventually get passed to the PEG parser, which will convert it to the decimal value 15
(see Appendix A).
We still have to parse the rest of the code, or x in (1, 2, 3)]
, which leaves as with the following code:
[15 or x in (1, 2, 3)]
Because in
have higher operator precedence than or
, we might expect x in (1, 2, 3)
to evaluate first.
That is troublesome situation, as x
doesn't exist and will raise a NameError
.
or
is lazyFortunately, Python supports Short-circuit evaluation as or
is a lazy operator: if the left operand is equivalent to True
, Python won't bother evaluating the right operand.
We can see it using the ast
module:
parsed = ast.parse('0xfor x in (1, 2, 3)', mode='eval')
ast.dump(parsed)
Output:
Expression(
body=BoolOp(
op=Or(),
values=[
Constant(value=15), # <-- Truthy value, so the next operand won't be evaluated.
Compare(
left=Name(id='x', ctx=Load()),
ops=[In()],
comparators=[
Tuple(elts=[Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load())
]
)
]
)
)
So the final expression is equal to [15]
.
On pegen.c's parsenumber_raw
function, we can find how Python treats leading zeros:
if (s[0] == '0') {
x = (long)PyOS_strtoul(s, (char **)&end, 0);
if (x < 0 && errno == 0) {
return PyLong_FromString(s, (char **)0, 0);
}
}
PyOS_strtoul
is in Python/mystrtoul.c
.
Inside mystrtoul.c, the parser looks at one character after the 0x
. If it's an hexadecimal character, Python sets the base of the number to be 16:
if (*str == 'x' || *str == 'X') {
/* there must be at least one digit after 0x */
if (_PyLong_DigitValue[Py_CHARMASK(str[1])] >= 16) {
if (ptr)
*ptr = (char *)str;
return 0;
}
++str;
base = 16;
} ...
Then it parses the rest of the number as long as the characters are in the range of 0-f:
while ((c = _PyLong_DigitValue[Py_CHARMASK(*str)]) < base) {
if (ovlimit > 0) /* no overflow check required */
result = result * base + c;
...
++str;
--ovlimit;
}
Eventually, it sets the pointer to point the last character that was scanned - which is one character past the last hexadecimal character:
if (ptr)
*ptr = (char *)str;