hI observed a somewhat confusing behavior when using ruamel.yaml with roundtrip loader. It's probably because it's not trivial for ruamel.yaml to automatically determine to which data item a comment should be connected.
In the below example, I would like to always keep the comment. It should be possible if I told ruamel.yaml that it should consider all comments connected to the next data item (i.e. the comment precedes "other").
Can this be done?
If yes: How?
data_was_a_dict = """\
---
data: was_a_dict
main_dict:
data:
some: data
# this comment gets always lost
other: data
"""
data_was_a_str = """\
---
data: was_a_str
main_dict:
data: a_string
# this gets shifted or stays
other: data
"""
import ruamel.yaml, sys
yaml = ruamel.yaml.YAML()
for text in [data_was_a_dict, data_was_a_str]:
for new_data in ["new_text", {"something": "else"}]:
data = yaml.load(text)
data["main_dict"]["data"] = new_data
yaml.dump(data, sys.stdout)
print("==========================")
Output:
data: was_a_dict
main_dict:
data: new_text
other: data
==========================
data: was_a_dict
main_dict:
data:
something: else
other: data
==========================
data: was_a_str
main_dict:
data: new_text
# this gets shifted or stays
other: data
==========================
data: was_a_str
main_dict:
data:
# this gets shifted or stays
something: else
other: data
==========================
========================================
Update thanks to Anthon:
def replace_dict(target, key, new_dict):
def get_last_key(dct):
keys = [key for key in dct.keys()]
return keys[-1]
old_dict = target[key]
if old_dict and new_dict:
# if new_dict is empty, we will lose the comment.
# That's fine for now since this should not happen in my case and I don't know yet where to attach
# the comment in that case
last_comment = old_dict.ca.items.get(get_last_key(old_dict), None)
if last_comment:
actual_comment = last_comment[2]
actual_comment.value = clean_comment(actual_comment.value)
if actual_comment.value:
if not isinstance(new_dict, ruamel.yaml.comments.CommentedMap):
new_dict = ruamel.yaml.comments.CommentedMap(new_dict)
new_dict.ca.items[get_last_key(new_dict)] = last_comment
target[key] = new_dict
def clean_comment(txt: str) -> str:
_,_,after = txt.partition("\n")
if after:
return "\n" + after
return ""
data_was_a_dict = """\
---
main_dict:
place: holder
sub_dict: # this stays
item1: value1
item2: value2 # this is removed
# this stays
other: data
"""
import ruamel.yaml, sys
import json
yaml = ruamel.yaml.YAML()
data = yaml.load(data_was_a_dict)
replace_dict(data["main_dict"], "sub_dict", {"item_new": "value_new"})
yaml.dump(data, sys.stdout)
gives
main_dict:
place: holder
sub_dict: # this stays
item_new: value_new
# this stays
other: data
I am not sure what is confusing about the following, documented behaviour on preservations of comments:
This preservation is normally not broken unless you severely alter the structure of a component (delete a key in a dict, remove list entries). Reassigning values or replacing list items, etc., is fine.
In three of the four combinations that you dump you have first either replaced a simple value by a composite value, or else removed the composite value that contains the comment information altogether.
In all versions up to the current (i.e. <0.18), ruamel.yaml attaches a scanned
comment to a token existing at the time of parsing of the comment. There is no
token (yet) for your next data item, so there is currently no way to attach this
to "the next data item". The actual comment information in ruamel.yaml<0.18 is
an extended end-of-line comment with a value something like "\n\n# this gets shifted or stays"
, as it starts with a newline, this means there is no actual
comment at the end of the line of the key it is associated with..
In your data_was_a_dict
the comment associated with the key some
and whether you replace
the CommentedMap
(a dict
subtype, with comments on
its .ca
attribute) with a string or a dict doesn't make a difference as the data structure with the comment is completely replaced.
In your data_was_a_str
YAML document it is associated with with the key data
on a "CommentedMap
on level higher than in the other document". If you replace
its value with another string the output will be similar to the input. If you
add a whole new substructure, the comment is interpreted as becoming between the
key and its (composite) value.
To get what you seem to expect, you have to check that there is a comment associated with the
key data
and move that to be associated with the key something
, which could
not be a key on a normal dict
(it would have to be a CommentedMap
).
In the combination where you delete/overwrite the data structure on which the comment is attached, you would have to check for a comment and move it before deletion. In combination where you replace the simple value with a composite one, you could move the comment after the assigment (given a suitable composite like CommentedMap
). So yes, what you want is possible, but not trivial and these would be relying on undocumented features that will change in upcoming versions.
import sys
import ruamel.yaml
data_was_a_dict = """\
data: was_a_dict
main_dict:
data:
some: data
# this comment gets always lost
other: data
"""
data_was_a_str = """\
data: was_a_str
main_dict:
data: a_string
# this gets shifted or stays
other: data
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(data_was_a_dict)
print(data['main_dict']['data'].ca)
data = yaml.load(data_was_a_str)
print(data['main_dict'].ca)
which gives:
Comment(comment=None,
items={'some': [None, None, CommentToken('\n\n# this comment gets always lost\n', line: 5, col: 0), None]})
Comment(comment=None,
items={'data': [None, None, CommentToken('\n\n# this gets shifted or stays\n', line: 4, col: 0), None]})
I am looking into replacing the comment scanning and attachment for ruamel.yaml that will allow the user to optionally split (on the first empty line), allow the user to specify to attach comment info to the prevous and/or following "data". Assignment potentially could be influenced by indent level. The documentation might even be updated to reflect that round-tripping would then support less severe restrictions on preserving comments when restructuring your data.