Search code examples
pythonvariablesyamlresolve

Resolving internal variables in YAML file


I have a YAML file which uses keys as references/variables in different sections as in the following example.

download:
  input_data_dir: ./data/input

prepare:
  input_dir: ${download.input_data_dir}
  output_dir: ./data/prepared

process:
  version: 1
  output_dir: ./output/${process.version}

I tried loading the YAML file in Python params = yaml.safe_load(open("../params.yaml")). This outputs ${download.input_data_dir} for params['prepare']['input_dir'], while the expected output is ./data/input. Similarly the expected output for params['process']['output_dir'] is ./output/1.

I wonder how the variables get resolved while loading the YAML file in Python to produce the expected results.


Solution

  • You can define a custom constructor that processes such references, then add an implicit resolver that recognizes the pattern based on a RegEx so that your constructor will be called:

    import yaml, sys, re
    
    class RefLoader(yaml.SafeLoader):
        # we override this method to remember the root node,
        # so that we can later resolve paths relative to it
        def get_single_node(self):
            self.cur_root = super(RefLoader, self).get_single_node()
            return self.cur_root
    
    def ref_constructor(loader, node):
        cur = loader.cur_root
        # [2:-1] gets the path inside ${...}
        for item in node.value[2:-1].split("."):
            # cur.value, if it's a mappping, contains a list
            # of (key, value) tuples
            for (key, value) in cur.value:
                # key, if it's a scalar, contains its textual
                # content in key.value
                if key.value == item:
                    cur = value
                    break
        # defer construction to the default constructor of
        # the referred node
        return loader.construct_object(cur)
    
    # register a custom tag for which our constructor is called
    RefLoader.add_constructor("!ref", ref_constructor)
    
    # tell PyYAML that a scalar that looks like `${...}` is to be
    # implicitly tagged with `!ref`, so that our custom constructor
    # is called.
    RefLoader.add_implicit_resolver("!ref", re.compile(r'^\$\{[^}]*\}$'), None)
    
    input = """
    download:
      input_data_dir: ./data/input
    
    prepare:
      input_dir: ${download.input_data_dir}
      output_dir: ./data/prepared
    
    process:
      version: 1
      output_dir: ./output/${process.version}
    """
    
    data = yaml.load(input, Loader=RefLoader)
    yaml.dump(data, sys.stdout)
    

    This yields:

    download:
      input_data_dir: ./data/input
    prepare:
      input_dir: ./data/input
      output_dir: ./data/prepared
    process:
      output_dir: ./output/${process.version}
      version: 1
    

    As you can see, this currently only processes nodes that contain a scalar starting with ${ and ending with }; the value of output_dir isn't processed. This code serves as example on how to generally process references, you should be able to modify it to fit your specific needs.