safely using python eval on non-arbitrary code

I would like to create a simple yaml workflow that works as metadata in a yaml environment as below. The user will create these and submit them, mostly to organize a modest number of tasks (such as specifying a chain of anomaly detectors). Imports will be parsed with importlib. I was planning to use newglobals=None and populate newlocals using the imports and arguments, then call eval(globals=newglobals,locals=newlocals). The workflow yaml would orchestrate work and create metadata in yaml which suits our needs and it is also easy to extend to non-python shell scripts.

My question concerns the use of eval. It isn't hard to find examples online of how malicious arbitrary code could be represented and run with yaml, e.g. with module=shutil, names='remove', expr='remove' and args = '/'.

However, the text is potentially non-arbitrary if the user is uses this workflow tool to organize their own work and stores the yaml in trusted repos. Is there an incremental danger to the yaml/eval approach compared to python if the python and yaml/eval are both managed using the same type of security? After all, I expect our organization members not to execute a file that says run os.shutil.remove('/'). Are there additional dangers?

imports:
    - module: mymod
      names:
          - func1
steps:
    - expr: 'func1(foo=foo) + 2'
      args:
          foo: 2

Solution

Let's put it this way:

If you have absolutely no eval or eval-like call in your program, it should be entirely predictable in terms of what it can do and what it won't do. If there's no call to any function that deletes files in your codebase, then you can be reasonably sure that your program won't ever be deleting any files, for instance.

I say "eval-like", because this also includes things like incorrectly concatenated SQL queries (SQL-injection), HTML-injection, shell command injection, even calling functions by name where the name value isn't being reasonably whitelisted. All these are things where the behaviour of your program is determined by runtime information, and thus becomes less predictable or unpredictable.

There are of course ways to use these things reasonably safe, by validating/limiting the possible values strings can take and/or escaping them correctly and/or passing them safely when using them in SQL/HTML/shell calls/functions selection etc.

A pure eval call can hardly reasonably be clamped down, as it allows for a wide variety of arbitrary expressions, purposefully so. If you use eval, you're very explicitly allowing any and all possible code to be executed. Some light limiting of globals etc. has time and again been shown to be ineffective. So it all comes down to whether you trust the code that you're evaling. And how you can trust the process by which you trust the code. Can you be sure that there's no way a malicious user can trigger the code path that leads to the execution of eval with some malicious payload? How can you be sure of this?

It's not impossible to architect a solution to this. But that's a lot more complex then not having eval in your code at all. And you'll have to maintain that trust solution over time. Systems tend to "soften" over time, spawning more and more features, allowing more and more access due to various reasons. That's where it gets hairy when including an eval in your program.