Search code examples
pythonyamlconfigpyyamlruamel.yaml

Include One Yaml File Inside Another


I want to have a base config file which is used by other config files to share common config.

E.g if I have one file base.yml with

foo: 1

bar:
  - 2
  - 3

And then a second file some_file.yml with

foo: 2

baz: "baz"

What I'd want to end up with a merged config file with

foo: 2

bar:
  - 2
  - 3

baz: "baz"

It's easy enough to write a custom loader that handles an !include tag.

class ConfigLoader(yaml.SafeLoader):

    def __init__(self, stream):
        super().__init__(stream)
        self._base = Path(stream.name).parent

    def include(self, node):
        file_name = self.construct_scalar(node)
        file_path = self._base.joinpath(file_name)

        with file_path.open("rt") as fh:
            return yaml.load(fh, IncludeLoader)

Then I can parse an !include tag. So if my file is

inherit:
   !include base.yml

foo: 2

baz: "baz"

But now the base config is a mapping. I.e. if I load the the file I'll end up with

config = {'a': [42], 'c': [3.6, [1, 2, 3]], 'include': [{'a': 1, 'b': [1.43, 543.55]}]}

But if I don't make the tag part of a mapping, e.g.

!include base.yml

foo: 2

baz: "baz"

I get an error. yaml.scanner.ScannerError: mapping values are not allowed here.

But I know that the yaml parser can parse tags without needing a mapping. Because I can do things like

!!python/object:foo.Bar
x: 1.0   
y: 3.14

So how do I write a loader and/or structure my YAML file so that I can include another file in my configuration?


Solution

  • In YAML you cannot mix scalars, mapping keys and sequence elements. This is invalid YAML:

    - abc
    d: e
    

    and so is this

    some_file_name
    a: b
    

    and that you have that scalar quoted, and provide a tag does of course not change the fact that it is invalid YAML.

    As you can already found out, you can trick the loader into returning a dict instead of the string (just like the parser already has built in constructors for non-primitive types like datetime.date).

    That this:

    !!python/object:foo.Bar
    x: 1.0
    y: 3.14
    

    works is because the whole mapping is tagged, where you just tag a scalar value.

    What also would be invalid syntax:

    !include base.yaml
    foo: 2
    baz: baz
    

    but you could do:

    !include
    filename: base.yaml
    foo: 2
    baz: baz
    

    and process the 'filename' key in a special way, or make the !include tag an empty key:

    !include : base.yaml  # : is a valid tag character, so you need the space
    foo: 2
    baz: baz
    

    I would however look at using merge keys, as merging is essentially what you are trying to do. The following YAML works:

    import sys
    import ruamel.yaml
    from pathlib import Path
    
    yaml_str = """
    <<: {x: 42, y: 196, foo: 3}
    foo: 2
    baz: baz
    """
    yaml = ruamel.yaml.YAML(typ='safe')
    yaml.default_flow_style = False
    data = yaml.load(yaml_str)
    yaml.dump(data, sys.stdout)
    

    which gives:

    baz: baz
    foo: 2
    x: 42
    y: 196
    

    So you should be able to do:

    <<: !load base.yaml
    foo: 2
    baz: baz
    

    and anyone with knowledge of merge keys would know what happens if base.yaml does include the key foo with value 3, and would also understand:

    <<: [!load base.yaml, !load config.yaml]
    foo: 2
    baz: baz
    

    (As I tend to associate "including" with textual including as in the C preprocessor, I think `!load' might be a more appropriate tag, but that is probably a matter of taste).

    To get the merge keys to work, it is probably easiest to just sublass the Constructor, as merging is done before tag resolving:

    import sys
    import ruamel.yaml
    from ruamel.yaml.nodes import MappingNode, SequenceNode, ScalarNode
    from ruamel.yaml.constructor import ConstructorError
    from ruamel.yaml.compat import _F
    from pathlib import Path
    
    
    
    class MyConstructor(ruamel.yaml.constructor.SafeConstructor):
        def flatten_mapping(self, node):
            # type: (Any) -> Any
            """
            This implements the merge key feature http://yaml.org/type/merge.html
            by inserting keys from the merge dict/list of dicts if not yet
            available in this node
            """
            merge = []  # type: List[Any]
            index = 0
            while index < len(node.value):
                key_node, value_node = node.value[index]
                if key_node.tag == 'tag:yaml.org,2002:merge':
                    if merge:  # double << key
                        if self.allow_duplicate_keys:
                            del node.value[index]
                            index += 1
                            continue
                        args = [
                            'while constructing a mapping',
                            node.start_mark,
                            'found duplicate key "{}"'.format(key_node.value),
                            key_node.start_mark,
                            """
                            To suppress this check see:
                               http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys
                            """,
                            """\
                            Duplicate keys will become an error in future releases, and are errors
                            by default when using the new API.
                            """,
                        ]
                        if self.allow_duplicate_keys is None:
                            warnings.warn(DuplicateKeyFutureWarning(*args))
                        else:
                            raise DuplicateKeyError(*args)
                    del node.value[index]
                    if isinstance(value_node, ScalarNode) and value_node.tag == '!load':
                        file_path = None
                        try:
                            if self.loader.reader.stream is not None:
                                file_path = Path(self.loader.reader.stream.name).parent / value_node.value
                        except AttributeError:
                            pass
                        if file_path is None:
                            file_path = Path(value_node.value)
                        # there is a bug in ruamel.yaml<=0.17.20 that prevents
                        # the use of a Path as argument to compose()
                        with file_path.open('rb') as fp:
                            merge.extend(ruamel.yaml.YAML().compose(fp).value)
                    elif isinstance(value_node, MappingNode):
                        self.flatten_mapping(value_node)
                        print('vn0', type(value_node.value), value_node.value)
                        merge.extend(value_node.value)
                    elif isinstance(value_node, SequenceNode):
                        submerge = []
                        for subnode in value_node.value:
                            if not isinstance(subnode, MappingNode):
                                raise ConstructorError(
                                    'while constructing a mapping',
                                    node.start_mark,
                                    _F(
                                        'expected a mapping for merging, but found {subnode_id!s}',
                                        subnode_id=subnode.id,
                                    ),
                                    subnode.start_mark,
                                )
                            self.flatten_mapping(subnode)
                            submerge.append(subnode.value)
                        submerge.reverse()
                        for value in submerge:
                            merge.extend(value)
                    else:
                        raise ConstructorError(
                            'while constructing a mapping',
                            node.start_mark,
                            _F(
                                'expected a mapping or list of mappings for merging, '
                                'but found {value_node_id!s}',
                                value_node_id=value_node.id,
                            ),
                            value_node.start_mark,
                        )
                elif key_node.tag == 'tag:yaml.org,2002:value':
                    key_node.tag = 'tag:yaml.org,2002:str'
                    index += 1
                else:
                    index += 1
            if bool(merge):
                node.merge = merge  # separate merge keys to be able to update without duplicate
                node.value = merge + node.value
    
    
    yaml = ruamel.yaml.YAML(typ='safe', pure=True)
    yaml.default_flow_style = False
    yaml.Constructor = MyConstructor
    
    
    
    yaml_str = """\
    <<: !load base.yaml
    foo: 2
    baz: baz
    """
    
    data = yaml.load(yaml_str)
    yaml.dump(data, sys.stdout)
    print('---')
    
    file_name = Path('test.yaml')
    file_name.write_text("""\
    <<: !load base.yaml
    bar: 2
    baz: baz
    """)
    
    data = yaml.load(file_name)
    yaml.dump(data, sys.stdout)
    

    this prints:

    bar:
    - 2
    - 3
    baz: baz
    foo: 2
    ---
    bar: 2
    baz: baz
    foo: 1
    

    Notes:

    • don't open YAML files as text. They are written binary (UTF-8), and you should load them as such (open(filename, 'rb')).
    • If you had included a full working program in your question (or at least included the text of IncludeLoader, it would have been possible to provide a full working example with the merge keys (or find out for you that it doesn't work for some reason)
    • as it is, it is unclear if your yaml.load() is an instance method call (import ruamel.yaml; yaml = ruamel.yaml.YAML()) or calling a function (from ruamel import yaml). You should not use the latter as it is deprecated.