Search code examples
pythoncommentsruamel.yaml

python ruamel.yaml package, how to get header comment lines?


I want to get YAML file comments on header lines, like

# 11111111111111111
# 11111111111111111
# 22222222222222222
# bbbbbbbbbbbbbbbbb
---
start:
....

And I used the ca attribute on the loaded data, butfound there are no these comments on it. Is there any other way to get these comments?


Solution

  • Currently (ruamel.yaml==0.17.17) the comments that occur before the document start token (---) are not passed on from the DocumentStartToken to the DocumentStartEvent, so these comments are effectively lost during parsing. Even if they were passed on, it is non-trivial to preserve them as the DocumentStartEvent is silently dropped during composition.

    You can either put the comments after the end of directives indicator (---) which allows you to get at the comments using the .ca attribute without a problem, or remove that indicator altogether as it is superfluous (at least in your example). Alternatively you will have to write a small wrapper around the loader:

    import sys
    import pathlib
    import ruamel.yaml
    
    fn = pathlib.Path('input.yaml')
    
    def load_with_pre_directives_comments(yaml, path):
        comments = []
        text = path.read_text()
        if '\n---\n' not in text and '\n--- ' not in text:
             return yaml.load(text), comments
        for line in text.splitlines(True):
            if line.lstrip().startswith('#'):
                comments.append(line)
            elif line.startswith('---'):
                return yaml.load(text), comments
                break
    
    yaml = ruamel.yaml.YAML()
    yaml.explicit_start = True
    data, comments = load_with_pre_directives_comments(yaml, fn)
    print(''.join(comments), end='')
    yaml.dump(data, sys.stdout)
    

    which gives:

    # 11111111111111111
    # 11111111111111111
    # 22222222222222222
    # bbbbbbbbbbbbbbbbb
    ---
    start: 42