Search code examples
yamlsanitization

Templating input into a yaml file - what sanitization do I need to do?


I have a yaml file that I am generating via a templating language (in this case, jinja2.) Here's a trivial snippet:

services_to_install:
  {% for service in services if service.install -%}
  - {{ service.name }}
  {% endfor -%}
user_data: |
  #! /bin/bash
  set -o errexit
  /usr/local/bin/ansible-playbook -i 127.0.0.1, -c local /tmp/ansible/playbook.yml --extra-vars 'app={{ app }}'

I know that, for instance, if I let service.name include a newline, it could escape out of the list it's supposed to be in and arbitrary yaml syntax could be written. So I am restricting newlines.

However, I don't know all the other possible abuses for "code injection" (i.e. writing arbitrary yaml syntax) that could exist. Putting aside language specific tags that could create objects during runtime, what other things do I have to look out for?

In other words, how do I sanitize input to a templated yaml file, much like one would sanitize input to a templated html file?

p.s. I am not married to one templating engine or another, I am more interested in yaml syntax.

EDIT added a block element to my example since I also use those.


Solution

  • The safest thing to do would be to write a filter that escapes the string and puts it in double quotes. Here is a complete list of escape sequences in YAML double-quoted scalar style.

    That being said, let's look at what is forbidden if you want to write it as plain (i.e. unquoted) scalar:

    Characters that may not start a plain scalar

    Certain characters may not start a plain scalar and therefore must not occur at the beginning. These are called indicator characters and include:

    • flow-style indicators (,, [, ], {, })
    • quotation marks that start a quoted scalar (', ")
    • characters that are used to start tags, anchors or aliases (!, &, *)
    • the comment indicator (#)
    • the directive indicator (%)
    • reserved characters (@, `)
    • block-style indicators (|, >, ?, :, -). However, ?, : and - are allowed if they are not followed by whitespace.

    Characters that would end a plain scalar

    Once the plain scalar is started, most characters are allowed. However, some characters will mark the end of the plain scalar:

    • flow-style indicators (,, [, ], {, }), but only if you are in flow style.
    • the mapping key indicator (:) if followed by whitespace.
    • the comment indicator (#) if preceded by whitespace.
    • a line break if the next non-empty line is indented lesser than the current indentation.

    Be aware that while it is possible to include newlines in a scalar (if indentation is handled correctly), those are subject to line folding and therefore, you would need to apply a transformation to the original value before using this style if you want it to be parsed to the same value.

    Character combinations that are forbidden altogether

    Inside a document, the character sequences --- and ... may never occur at the beginning of a line (they are fine everywhere else) because they indicate the end of the current document and possibly the start of a new one.

    Conclusion

    Plain scalars do not have an escaping mechanism and therefore are restricted in what strings they can represent. Double-quoted scalars are the only representation that is able to represent all possible strings and therefore is what you want to go for.

    Choosing whether to represent a string as plain or quoted scalar is usually the task of a YAML implementation, because the decision making is complex and has many caveats. If you generate YAML with a templating engine, you probably do not have access to all information to make that decision – for example, current indentation, state (flow-style vs block-style) etc. Therefore, to be safe, use a filter to escape special characters and use double-quoted style.