Search code examples
pythoncompiler-constructioncellfreeabstract-syntax-tree

where is the `__class__` variable stored in python, or how does the compiler know where to find it


python relies on the __class__ variable to be in a cell for a super() call. It gets this cell from the free variables in the first stack frame.

The odd thing is though that this variable isn't in locals(), and it is when you just reference it from the __init__ method.

Take for example this bit of code:

class LogicGate:
    def __init__(self,n):
        print(locals())
        a = __class__
        print(locals())

When you disassemble this you can see it somehow knows that print and locals are globals and __class__ is a LOAD_DEREF. How does the compiler know this, before running the code. locals, print and __class__ are just variable names to the compiler as far as I know. Also this way __class__ is all of a sudden in the locals() even before it's copied into a.

4          10 LOAD_DEREF               0 (__class__)

while locals:

            2 LOAD_GLOBAL              1 (locals)

I'm asking because I'm working on skulpt a python to javascript compiler. And currently that compiler doesn't differentiate between print or __class__ and attempts to get them both from the global scope.

As you can see from a printout of the ast of the above bit of code, the parser doesn't differentiate between locals or __class__:

Module(body=[ClassDef(name='LogicGate',
    bases=[],
    keywords=[],
    body=[FunctionDef(name='__init__',
        args=arguments(args=[arg(arg='self',
                                 annotation=None),
                             arg(arg='n',
                                  annotation=None)],
                       vararg=None,
                       kwonlyargs=[],
                       kw_defaults=[],
                       kwarg=None,
                       defaults=[]),
        body=[Expr(value=Call(func=Name(id='print',
                                        ctx=Load()),
                                              # here's the load for locals
                              args=[Call(func=Name(id='locals',
                                                   ctx=Load()),
                                         args=[],
                                         keywords=[])],
                              keywords=[])),
              Assign(targets=[Name(id='a',
                                   ctx=Store())],
                           # here's the load for __class__
                     value=Name(id='__class__',
                                ctx=Load())),
              Expr(value=Call(func=Name(id='print',
                                        ctx=Load()),
                              args=[Call(func=Name(id='locals',
                                                   ctx=Load()),
                                         args=[],
                                         keywords=[])],
                              keywords=[]))],
        decorator_list=[],
        returns=None)],
   decorator_list=[])])

Solution

  • The __class__ cell is a hack in Python 3 to allow super to be called without args. In Python 2 you had to call super with boilerplate arguments (ie.super(<current class>, self)).

    The __class__ cell itself is stored in the <function>.__closure__ tuple. The index of the __class__ cell can be obtained by finding its index in the <function>.__code__.co_freevars tuple. For instance,

    >>> class A:
        def __init__(self):
            super().__init__()
    
    >>> A.__init__.__code__.co_freevars
    ('__class__',)
    >>> A.__init__.__closure__
    (<cell at 0x03EEFDF0: type object at 0x041613E0>,)
    >>> A.__init__.__closure__[
            A.__init__.__code__.co_freevars.index('__class__')
        ].cell_contents
    <class '__main__.A'>
    

    However, depending on the function, co_freevars and __closure__ may be None if the function doesn't use cells. Further, __class__ is not guaranteed to be present. The __class__ cell is only present if a function called super is called without args (doesn't actually have to be super eg. super = print; super() will fool the compiler into creating a __class__ cell) or if __class__ is explicitly referenced and is not local. You also cannot assume that the __class__ cell is always at index 0, as the following (albeit bizarre) code shows:

    class A:
        def greet(self, person):
            print('hello', person)
    
    def create_B(___person__):
        class B(A):
            def greet(self):
                super().greet(___person__)
        return B
    
    B = create_B('bob')
    B().greet() # prints hello bob
    
    assert [c.cell_contents for c in B.greet.__closure__] == ['bob', B]