Search code examples
pythonyamlpyyaml

Access elements inside yaml using python


I am using yaml and pyyaml to configure my application.

Is it possible to configure something like this -

config.yml -

root:
    repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
    data_root: $root.repo_root/data

service:
    root: $root.data_root/csv/xyz.csv

yaml loading function -

def load_config(config_path):
    config_path = os.path.abspath(config_path)
    
    if not os.path.isfile(config_path):
        raise FileNotFoundError("{} does not exist".format(config_path))
    else:
        with open(config_path) as f:
            config = yaml.load(f, Loader=yaml.SafeLoader)
        # logging.info(config)
        logging.info("Config used for run - \n{}".format(yaml.dump(config, sort_keys=False)))
        return DotDict(config)

Current Output-

root:
  repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
  data_root: ${root.repo_root}/data

service:
  root: ${root.data_root}/csv/xyz.csv

Desired Output -

root:
  repo_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght
  data_root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data

service:
  root: /home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data/csv/xyz.csv

Is this even possible with python? If so any help would be really nice.

Thanks in advance.


Solution

  • A general approach:

    • read the file as is
    • search for strings containing $:
      • determine the "path" of "variables"
      • replace the "variables" with actual values

    An example, using recursive call for dictionaries and replaces strings:

    import re, pprint, yaml
    
    def convert(input,top=None):
        """Replaces $key1.key2 with actual values. Modifies input in-place"""
        if top is None:
            top = input # top should be the original input
        if isinstance(input,dict):
            ret = {k:convert(v,top) for k,v in input.items()} # recursively convert items
            if input != ret: # in case order matters, do it one or several times more until no change happens
                ret = convert(ret)
            input.update(ret) # update original input
            return input # return updated input (for the case of recursion)
        if isinstance(input,str):
            vars = re.findall(r"\$[\w_\.]+",input) # find $key_1.key_2.keyN sequences
            for var in vars:
                keys = var[1:].split(".") # remove dollar and split by dots to make "key chain"
                val = top # starting from top ...
                for k in keys: # ... for each key in the key chain ...
                    val = val[k] # ... go one level down
                input = input.replace(var,val) # replace $key sequence eith actual value
            return input # return modified input
        # TODO int, float, list, ...
    
    with open("in.yml") as f: config = yaml.load(f) # load as is
    convert(config) # convert it (in-place)
    pprint.pprint(config)
    

    Output:

    {'root': {'data_root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data',
              'repo_root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght'},
     'service': {'root': '/home/raghhuveer/code/data_science/papers/cv/AlexNet_lght/data/csv/xyz.csv'}}
    

    Note: YAML is not that important here, would work also with JSON, XML or other formats.

    Note2: If you use exclusively YAML and exclusively python, some answers from this post may be useful (using anchors and references and application specific local tags)