I have a YAML file which uses keys as references/variables in different sections as in the following example.
download:
input_data_dir: ./data/input
prepare:
input_dir: ${download.input_data_dir}
output_dir: ./data/prepared
process:
version: 1
output_dir: ./output/${process.version}
I tried loading the YAML file in Python params = yaml.safe_load(open("../params.yaml"))
. This outputs ${download.input_data_dir}
for params['prepare']['input_dir']
, while the expected output is ./data/input
. Similarly the expected output for params['process']['output_dir']
is ./output/1
.
I wonder how the variables get resolved while loading the YAML file in Python to produce the expected results.
You can define a custom constructor that processes such references, then add an implicit resolver that recognizes the pattern based on a RegEx so that your constructor will be called:
import yaml, sys, re
class RefLoader(yaml.SafeLoader):
# we override this method to remember the root node,
# so that we can later resolve paths relative to it
def get_single_node(self):
self.cur_root = super(RefLoader, self).get_single_node()
return self.cur_root
def ref_constructor(loader, node):
cur = loader.cur_root
# [2:-1] gets the path inside ${...}
for item in node.value[2:-1].split("."):
# cur.value, if it's a mappping, contains a list
# of (key, value) tuples
for (key, value) in cur.value:
# key, if it's a scalar, contains its textual
# content in key.value
if key.value == item:
cur = value
break
# defer construction to the default constructor of
# the referred node
return loader.construct_object(cur)
# register a custom tag for which our constructor is called
RefLoader.add_constructor("!ref", ref_constructor)
# tell PyYAML that a scalar that looks like `${...}` is to be
# implicitly tagged with `!ref`, so that our custom constructor
# is called.
RefLoader.add_implicit_resolver("!ref", re.compile(r'^\$\{[^}]*\}$'), None)
input = """
download:
input_data_dir: ./data/input
prepare:
input_dir: ${download.input_data_dir}
output_dir: ./data/prepared
process:
version: 1
output_dir: ./output/${process.version}
"""
data = yaml.load(input, Loader=RefLoader)
yaml.dump(data, sys.stdout)
This yields:
download:
input_data_dir: ./data/input
prepare:
input_dir: ./data/input
output_dir: ./data/prepared
process:
output_dir: ./output/${process.version}
version: 1
As you can see, this currently only processes nodes that contain a scalar starting with ${
and ending with }
; the value of output_dir
isn't processed. This code serves as example on how to generally process references, you should be able to modify it to fit your specific needs.