Search code examples
pythonabstract-syntax-tree

AST parsing safety: memory and time


I'm worried that parsing AST of a malicious code can yield a very large object or last very long time. Are my worries sound? Is there a way for the tree to be unreasonably large when the code itself is small?


Solution

  • The size of the AST is proportional to the length of the program being parsed. So that's not a worry.

    Python uses a PEG parser, which is a memoised backtracking algorithm, which should take time linear in the length of the code. If you encounter an input which takes exceptionally long to parse a short text, you should report it as a bug (which would be the case).

    However, remember that syntax trees are not balanced. It's not difficult to construct a program whose AST has a depth linear in the size of the code. Since the execution stack is quite limited in size, recursive traverses of an AST can overflow the stack, even though they only use a small percentage of memory. There is a warning about this in the documentation for ast.parse in the Python library manual:

    Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.