Search code examples
pythonpyyamlruamel.yaml

ruamel.yaml: pin comment to next data item instead of previous one


hI observed a somewhat confusing behavior when using ruamel.yaml with roundtrip loader. It's probably because it's not trivial for ruamel.yaml to automatically determine to which data item a comment should be connected.

In the below example, I would like to always keep the comment. It should be possible if I told ruamel.yaml that it should consider all comments connected to the next data item (i.e. the comment precedes "other").

Can this be done?

If yes: How?

data_was_a_dict = """\
---
data: was_a_dict
main_dict:
  data: 
    some: data

# this comment gets always lost
other: data
"""

data_was_a_str = """\
---
data: was_a_str
main_dict:
  data: a_string  

# this gets shifted or stays
other: data
"""

import ruamel.yaml, sys


yaml = ruamel.yaml.YAML()
for text in [data_was_a_dict, data_was_a_str]:
    for new_data in ["new_text", {"something": "else"}]:
        data = yaml.load(text)

        data["main_dict"]["data"] = new_data
        yaml.dump(data, sys.stdout)
        print("==========================")

Output:

data: was_a_dict
main_dict:
  data: new_text
other: data
==========================
data: was_a_dict
main_dict:
  data:
    something: else
other: data
==========================
data: was_a_str
main_dict:
  data: new_text

# this gets shifted or stays
other: data
==========================
data: was_a_str
main_dict:
  data:

# this gets shifted or stays
    something: else
other: data
==========================

========================================

Update thanks to Anthon:

def replace_dict(target, key, new_dict):
    def get_last_key(dct):
        keys = [key for key in dct.keys()]
        return keys[-1]
        
    old_dict = target[key]
    if old_dict and new_dict:
        # if new_dict is empty, we will lose the comment. 
        # That's fine for now since this should not happen in my case and I don't know yet where to attach 
        # the comment in that case
        last_comment = old_dict.ca.items.get(get_last_key(old_dict), None)
        if last_comment:
            actual_comment = last_comment[2]
            actual_comment.value = clean_comment(actual_comment.value)
            if actual_comment.value:
                if not isinstance(new_dict, ruamel.yaml.comments.CommentedMap):
                    new_dict = ruamel.yaml.comments.CommentedMap(new_dict)                
                new_dict.ca.items[get_last_key(new_dict)] = last_comment
    target[key] = new_dict
    
def clean_comment(txt: str) -> str:
    _,_,after = txt.partition("\n")
    if after:
        return "\n" + after
    return ""

data_was_a_dict = """\
---
main_dict:
  place: holder
  sub_dict: # this stays
    item1: value1
    item2: value2 # this is removed
    
# this stays    
other: data
"""

import ruamel.yaml, sys
import json

yaml = ruamel.yaml.YAML()

data = yaml.load(data_was_a_dict)
replace_dict(data["main_dict"], "sub_dict", {"item_new": "value_new"})

yaml.dump(data, sys.stdout)

gives

main_dict:
  place: holder
  sub_dict: # this stays
    item_new: value_new
# this stays    
other: data

Solution

  • I am not sure what is confusing about the following, documented behaviour on preservations of comments:

    This preservation is normally not broken unless you severely alter the structure of a component (delete a key in a dict, remove list entries). Reassigning values or replacing list items, etc., is fine.

    In three of the four combinations that you dump you have first either replaced a simple value by a composite value, or else removed the composite value that contains the comment information altogether.

    In all versions up to the current (i.e. <0.18), ruamel.yaml attaches a scanned comment to a token existing at the time of parsing of the comment. There is no token (yet) for your next data item, so there is currently no way to attach this to "the next data item". The actual comment information in ruamel.yaml<0.18 is an extended end-of-line comment with a value something like "\n\n# this gets shifted or stays", as it starts with a newline, this means there is no actual comment at the end of the line of the key it is associated with..

    In your data_was_a_dict the comment associated with the key some and whether you replace the CommentedMap (a dict subtype, with comments on its .ca attribute) with a string or a dict doesn't make a difference as the data structure with the comment is completely replaced.

    In your data_was_a_str YAML document it is associated with with the key data on a "CommentedMap on level higher than in the other document". If you replace its value with another string the output will be similar to the input. If you add a whole new substructure, the comment is interpreted as becoming between the key and its (composite) value.

    To get what you seem to expect, you have to check that there is a comment associated with the key data and move that to be associated with the key something, which could not be a key on a normal dict (it would have to be a CommentedMap). In the combination where you delete/overwrite the data structure on which the comment is attached, you would have to check for a comment and move it before deletion. In combination where you replace the simple value with a composite one, you could move the comment after the assigment (given a suitable composite like CommentedMap). So yes, what you want is possible, but not trivial and these would be relying on undocumented features that will change in upcoming versions.

    import sys
    import ruamel.yaml
    
    data_was_a_dict = """\
    data: was_a_dict
    main_dict:
      data: 
        some: data
    
    # this comment gets always lost
    other: data
    """
    
    data_was_a_str = """\
    data: was_a_str
    main_dict:
      data: a_string  
    
    # this gets shifted or stays
    other: data
    """
    
    yaml = ruamel.yaml.YAML()
    
    data = yaml.load(data_was_a_dict)
    print(data['main_dict']['data'].ca)
    data = yaml.load(data_was_a_str)
    print(data['main_dict'].ca)
    

    which gives:

    Comment(comment=None,
      items={'some': [None, None, CommentToken('\n\n# this comment gets always lost\n', line: 5, col: 0), None]})
    Comment(comment=None,
      items={'data': [None, None, CommentToken('\n\n# this gets shifted or stays\n', line: 4, col: 0), None]})
    

    I am looking into replacing the comment scanning and attachment for ruamel.yaml that will allow the user to optionally split (on the first empty line), allow the user to specify to attach comment info to the prevous and/or following "data". Assignment potentially could be influenced by indent level. The documentation might even be updated to reflect that round-tripping would then support less severe restrictions on preserving comments when restructuring your data.