Search code examples
yamlpyyamlruamel.yaml

Trying to get all paths in a YAML file


I've got an input YAML file (test.yml) as follows:

# sample set of lines
foo:
  x: 12
  y: hello world
  ip_range['initial']: 1.2.3.4
  ip_range[]: tba
  array['first']: Cluster1

array2[]: bar

The source contains square brackets for some keys (possibly empty).

I'm trying to get a line by line list of all the paths in the file, ideally like:

foo.x: 12
foo.y: hello world
foo.ip_range['initial']: 1.2.3.4
foo.ip_range[]: tba
foo.array['first']: Cluster1
array2[]: bar

I've used the yamlpaths library and the yaml-paths CLI, but can't get the desired output. Trying this:

yaml-paths -m -s =foo -K test.yml

outputs:

foo.x
foo.y
foo.ip_range\[\'initial\'\]
foo.ip_range\[\]
foo.array\[\'first\'\]

Each path is on one line, but the output has all the escape characters ( \ ). Modifying the call to remove the -m option ("expand matching parent nodes") fixes that problem but the output is then not one path per line:

yaml-paths -s =foo -K test.yml

gives:

foo: {"x": 12, "y": "hello world", "ip_range['initial']": "1.2.3.4", "ip_range[]": "tba", "array['first']": "Cluster1"}

Any ideas how I can get the one line per path entry but without the escape chars? I was wondering if there is anything for path querying in the ruamel modules?


Solution

  • Your "paths" are nothing more than the joined string representation of the keys (and probably indices) of the mappings (and potentially sequences) in your YAML document.

    That can be trivially generated from data loaded from YAML with a recursive function:

    import sys
    import ruamel.yaml
    
    yaml_str = """\
    # sample set of lines
    foo:
      x: 12
      y: hello world
      ip_range['initial']: 1.2.3.4
      ip_range[]: tba
      array['first']: Cluster1
    
    array2[]: bar
    """
    
    def pathify(d, p=None, paths=None, joinchar='.'):
        if p is None:
            paths = {}
            pathify(d, "", paths, joinchar=joinchar)
            return paths
        pn = p
        if p != "":
            pn += '.'
        if isinstance(d, dict):
            for k in d:
                v = d[k]
                pathify(v, pn + k, paths, joinchar=joinchar)
        elif isinstance(d, list):
            for idx, e in enumerate(d):
                pathify(e, pn + str(idx), paths, joinchar=joinchar)
        else:
            paths[p] = d
    
    
    yaml = ruamel.yaml.YAML(typ='safe')
    paths = pathify(yaml.load(yaml_str))
    
    for p, v in paths.items():
        print(f'{p} -> {v}')
    

    which gives:

    foo.x -> 12
    foo.y -> hello world
    foo.ip_range['initial'] -> 1.2.3.4
    foo.ip_range[] -> tba
    foo.array['first'] -> Cluster1
    array2[] -> bar