Search code examples
pythonyamlpyyamlruamel.yaml

Is it safe to use the default load in ruamel.yaml


I can load and dump YAML files with tags using ruamel.yaml, and the tags are preserved.

If I let my customers edit the YAML document, will they be able to exploit the YAML vulnerabilities because of arbitrary python module code execution? As I understand it ruamel.yaml is derived from PyYAML that has such vulnerabilities according to its documentation.


Solution

  • From your question I deduct you are using .load() method on a YAML() instance as in:

    import ruamel.yaml
    yaml = ruamel.yaml.YAML()
    data = yaml.load(some_path)
    

    and not the old PyYAML load compatible function (which cannot handle unknown tags, and which can be unsafe). The short answer is that yes that is safe as no interpretation of tags is done without calling any tag dependent code, the tags just get assigned (as strings) to known types (CommentedMap, CommentedSeq, TaggedScalar for mapping, sequence and scalar respectively).

    The vulnerability in PyYAML comes from its Constructor, that is part of the unsafe Loader. This was the default for PyYAML for the longest time and most people used it even when they could have used the safeloader because they were not using any tags (or could have regeistred the tags they needed against the SafeLoader/Constructor). That unsafe Constructor is a subclass of SafeConstructor and what makes it unsafe are the multi-methods registered for the interpretation of python/{name,module,object,apply,new):.... tags, essentially dynamically interpreting these tags to load modules and run code (i.e. execute functions/methods/instantiate classes).

    The initial idea behind ruamel.yaml was its RoundTripConstructor , which is also a subclass of the SafeConstructor. You used to get this using the now deprecated round_trip_load function and nowadays via the .load() method after using YAML(typ='rt'), this is also the default for a YAML() instance without typ argument. That RoundTripConstructor does not registers any tags or have any code that interprets tags above and beyond the normal tags that the SafeConstructor uses.

    Only the old PyYAML load and ruamel.yaml using typ='unsafe' can execute arbitrary Python code on doctored YAML input by executing code via the !!python/.. tags.

    When you use typ='rt' the tag on nodes is preserved, even when a tag is not registered. This is possible because while processing nodes (in round-trip mode), those tags will just be assigned as strings to an attribute on the constructed type of the tagged node (mapping, sequence, or scalar, with the exception of tagged null). And when dumping these types, if they have a tag, it gets re-inserted into the representation processing code. At no point is the tag evaluated, used to import code, etc.