Search code examples
pythonscoping

How does Python know there is a local variable before encountering its declaration?


def f(): 
    print("Before", locals())   # line 2
    print(x);                   # line 3
    x = 2                       # line 4
    print("After", locals())    # line 5

x = 1
f()

I am aware of the LEGB rule for scoping in Python.

For the above code, when I comment out line 4, everything executes normally as expected: for line 3, python does not find variable x in the local scope and therefore searches it in the global scope where it finds it and prints 1.

But when I execute the whole code as it is without commenting, it raises UnboundLocalError: local variable 'x' referenced before assignment.

I do know I can use nonlocal and global, but my question is :

  1. How does python know there is a local variable declaration before it has encountered one?
  2. Even if it does know there is a variable named x in the local scope (although not yet initialised), why doesn't it shows it in locals()?

I tried finding the answer in similar questions suggestions but failed. Please correct if any of my understanding is wrong.


Solution

  • To some extent, the answer is implementation specific, as Python only specifies the expected behavior, not how to implement it.

    That said, let's look at the byte code generated for f by the usual implementation, CPython:

    >>> import dis
    >>> dis.dis(f)
      2           0 LOAD_GLOBAL              0 (print)
                  2 LOAD_CONST               1 ('Before')
                  4 LOAD_GLOBAL              1 (locals)
                  6 CALL_FUNCTION            0
                  8 CALL_FUNCTION            2
                 10 POP_TOP
    
      3          12 LOAD_GLOBAL              0 (print)
                 14 LOAD_FAST                0 (x)
                 16 CALL_FUNCTION            1
                 18 POP_TOP
    
      4          20 LOAD_CONST               2 (2)
                 22 STORE_FAST               0 (x)
    
      5          24 LOAD_GLOBAL              0 (print)
                 26 LOAD_CONST               3 ('After')
                 28 LOAD_GLOBAL              1 (locals)
                 30 CALL_FUNCTION            0
                 32 CALL_FUNCTION            2
                 34 POP_TOP
                 36 LOAD_CONST               0 (None)
                 38 RETURN_VALUE
    

    There are several different LOAD_* op codes used to retrieve various values. LOAD_GLOBAL is used for names in the global scope; LOAD_CONST is used for local values not assigned to any name. LOAD_FAST is used for local variables. Local variables don't even exist by name, but by indices in an array. That's why they are "fast"; they are available in an array rather than a hash table. (LOAD_GLOBAL also uses integer arguments, but that's just an index into an array of names; the name itself still needs to be looked up in whatever mapping provides the global scope.)

    You can even see the constants and local values associated with f:

    >>> f.__code__.co_consts
    (None, 'Before', 2, 'After')
    >>> f.__code__.co_varnames
    ('x',)
    

    LOAD_CONST 1 puts Before on the stack because f.__code__.co_consts[1] == 'Before', and LOAD_FAST 0 puts the value of x on the stack because f.__code__.co_varnames[0] == 'x'.

    The key here is that the byte code is generated before f is ever executed. Python isn't simply executing each line the first time it sees it. Executing the def statement involves, among other things:

    1. reading the source code
    2. parsing into an abstract syntax tree (AST)
    3. using the entire AST to generate the byte code stored in the __code__ attribute of the function object.

    Part of the code generation is noting that the name x, due to the assignment somewhere in the body of the function (even if that function is logically unreachable), is a local name, and therefore must be accessed with LOAD_FAST.

    At the time locals is called (and indeed before LOAD_FAST 0 is used the first time), no assignment to x (i.e., STORE_FAST 0) has yet been made, so there is no local value in slot 0 to look up.