Search code examples
python-3.xyamlpyyaml

pyYAML load line in YAML file containing custom Tag?


I have the following YAML file which is used by a third party tool.

timezone: "Europe/Zurich"

_export:
  py:
    python: ${virtualenv_home}/bin/python3
  log_level: INFO
  poll_interval: 1
  workflow_name: test_workflow

!include : 'params.yml'

+say_hello:
  echo>: Hello world!

My goal is to load this YAML file with PyYAML and change a few things and then dump it into a file. This would work just fine if that "!include: 'params.yml'" wouldn't be.

How would I load this line so if it gets dumped back into a file it looks the same way it does now "!include : 'params.yml'? The actual including will be handled by the third-party tool.

I played around with the answer from the following post PyYAML: load and dump yaml file and preserve tags ( !CustomTag ) but didn't get the correct results.

? !include ''
: params.yml

Thank you


Solution

  • The result you got is correct in the sense that is is equivalent to your input.

    PyYAML, and YAML in general, is not a format where you have complete control over how your data is serialized. In some situations, it just makes decisions that you cannot influence. Let's check that with a minimal example:

    import yaml, sys
    node = yaml.compose("!include : 'params.yml'")
    yaml.serialize(node, sys.stdout)
    

    This code loads the input only up until node level, where information about the style of the key and value are preserved. Then it dumps this again as YAML. The output is

    ? !include ''
    : 'params.yml'
    

    As you can see, even though PyYAML knows that !include '' originally was an implicit key whose content was an empty plain scalar, it still serializes it as explicit key with its content being an empty single-quoted scalar. From a YAML perspective this is equivalent to the original input, it just uses different YAML style.

    This experiment shows that you cannot force PyYAML to output the key-value pair in the style you want, since even if it knows the original style, it still changes it. This means there is no way to achieve what you want unless you modify PyYAML itself.

    If you have control over the final consumer of the YAML file, I would suggest to change the structure like this:

    $include: 'params.yml'
    

    The usage of !include in your original file already goes against the spec's intention because tags are supposed to be applied to the value they are tagging. However in your case, !include tags an empty scalar that is a mapping key, but seems to be applied to the value of said key. If you want to have a key that does something special with its value, you should rather have the key to be a scalar with a special value, not an empty scalar with a special tag. By using $include, it is far easier to do a modification while preserving the YAML style. IIRC OpenAPI uses a similar technique with $ref.