Search code examples
pythonabstract-syntax-treeself

Accessing instance variables using "self" via ast.Parse, compile, and exec


Don't ask me why... I have a need to allow users to write expressions that get stored on an object. These expressions need access to the objects instance attributes on which they're stored. My current plan is to

  1. store the expression string
  2. parse with ast
  3. use a ast.NodeTransformer to replace the relevant NAME() nodes with nodes that access the instance attributes ("val1 > val2" to self.val1 > self.val2)
  4. compile and store on the ojbect
  5. exec the compiled ast when needed

Is there a way to access the objects self reference from the ast? I've tried the below and I'm not getting the output I would expect. The attribute node looks right to me, so I'm assuming something is going wrong with the objects self reference.

Thanks in advance!

CODE:

class test_obj:
    def __init__(self) -> None:
        self.value = 0

    def get_value(self):
        tree = ast.parse(ast.parse("self.value", "exec"))
        exe = compile(ast.parse("self.value", "exec"), "", "exec")
        print(f"Noraml access: {self.value}")
        print(f"Tree dump: {ast.dump(tree)}")
        print(f"Exec: {exec(exe)}")


a = test_obj()
a.get_value()

OUTPUT:

Noraml access: 0

Tree dump: Module(body=[Expr(value=Attribute(value=Name(id='self', ctx=Load()), 
attr='value', ctx=Load()))], type_ignores=[])

Exec: None

Solution

  • It's not clear to me what you intend with the nested call to ast.parse in:

    tree = ast.parse(ast.parse("self.value", "exec"))
    

    The outer call is a no-op, as it happens, and the "exec" in the inner call is the filename, not the mode (although the mode will also be exec --i.e. a sequence of statements-- because that's the default value). (It's usually a good idea to name named arguments, and especially named arguments which follow arguments with default values.)

    The AST returned by ast.parse(..., mode="exec") is a Module AST object, which is the AST type used to hold a sequence of statements at the outer level. You can call either exec or eval on a compiled Module AST; I don't believe there is a semantic difference. Either way, no value is returned. (Similarly, you can call exec on a compiled Expression; the return value is discarded. But this case isn't documented afaik.)

    If you want to store a compiled Expression AST and be able to retrieve the result of evaluating the expression, you'll need to specify mode="eval" both when creating the AST and when calling compile. So you might end up with a test harness which looks like the below. (This is OK for testing but for production use, please read the note on insecurity at the end of this answer).

    This code requires Python 3.9 for the indent argument to ast.dump, which makes the output more readable (at least, for me). But it's not essential.

    import ast
    
    class test_obj:
        def __init__(self, value=42):
            self.value = value
    
        def get_value(self, expr="self.value"):
            tree = ast.parse(expr, filename="<get_value>", mode="eval")
            exe = compile(tree, filename="<get_value>", mode="eval")
            print(f"self.value = {self.value}")
            print(f"Tree dump: {ast.dump(tree, indent=2)}")
            print(f"Exec: {eval(exe)}")
    
    a = test_obj(17)
    a.get_value("self.value * 3")
    

    This has what I suppose is the expected result:

    self.value = 17
    Tree dump: Expression(
      body=BinOp(
        left=Attribute(
          value=Name(id='self', ctx=Load()),
          attr='value',
          ctx=Load()),
        op=Mult(),
        right=Constant(value=3)))
    Exec: 51
    

    Insecurity warning

    I don't think the above is really safe, given that the expressions to evaluate are user input (and therefore untrusted). It would be better to restrict access to both global and local variables. As written, eval is called with defaults for both globals and locals, which means that the same identifiers are visible in the evaluated code as are visible in the method get_value:

    >>> a.get_value("__import__('os').system('echo ---------;echo Could have been rm -fR; echo ---------')")
    self.value = 42
    Tree dump: Expression(
      body=Call(
        func=Attribute(
          value=Call(
            func=Name(id='__import__', ctx=Load()),
            args=[
              Constant(value='os')],
            keywords=[]),
          attr='system',
          ctx=Load()),
        args=[
          Constant(value='echo ---------;echo Could have been rm -fR; echo ---------')],
        keywords=[]))
    ---------
    Could have been rm -fR
    ---------
    Exec: 0
    

    I'd suggest replacing eval(exe) with eval(exe, globals={'self':self}); if that doesn't provide all the global identifiers you think your users might need, you (cautiously) add others to the supplied global dictionary.