Search code examples
yamlruamel.yaml

ruamel.yaml some comments get lost in round_trip_dump()


The code below reproduces the issue:

from ruamel.yaml import round_trip_load, round_trip_dump, YAML
import sys

obj = round_trip_load('''\
a: 1  # comment for a
b:  # comment for b
  c: 2  # comment for c
''')

print('-------------------------- YAML().dump(obj, sys.stdout)')
YAML().dump(obj, sys.stdout)

print('-------------------------- print(round_trip_dump(obj))')
print(round_trip_dump(obj))

Output

-------------------------- YAML().dump(obj, sys.stdout)
a: 1  # comment for a
b:  # comment for b
  c: 2  # comment for c
-------------------------- print(round_trip_dump(obj))
a: 1
b:  # comment for b
  c: 2

You can see that YAML().dump(obj, sys.stdout) correctly prints the comments, but print(round_trip_dump(obj)) loses comment for a and comment for c.

If you run it in debug mode, you can see obj.ca.items correctly keep the comments.

Therefore I believe it is a bug in round_trip_dump().

I was going to create a ticket for the project, but the instruction asked me to post here first, to clarify that it is not a faulty usage of the library.

Is it a bug? Or my misusage?

The document asks why output of dump() as a string is necessary.

Because I want to convert the object into a string and save as VARCHAR in database.

Thank you in advance.

Environment: ruamel.yaml==0.17.21 with Python 3.11 on 64 bit Windows


Solution

  • This definitely falls under misusage. The documentation you refer to explains why a new API was needed,and you combine round_trip_dump/round_trip_load from the old API with a YAML() instance from the new API and expect things to work, which they don't ( otherwise a new API would probably not have been necessary ). Don't use the old API, for any new projects.

    If you have a streaming API (like the one currently in ruamel.yaml) you can easily grab the output, using an io.BytesIO() buffer:

    import sys
    import io
    import ruamel.yaml
    
    yaml_str = """\
    a: 1  # comment for a
    b:  # comment for b
      c: 2  # comment for c
    """
        
    yaml = ruamel.yaml.YAML()
    yaml.preserve_quotes = True
    data = yaml.load(yaml_str)
    buf = io.BytesIO()  # yamp.dump generates a stream of utf-8/bytes 
    yaml.dump(data, buf)
    yaml_out = buf.getvalue().decode("utf-8")
    print(f'here is the streamed output:\n{yaml_out}')
    

    which gives:

    here is the streamed output:
    a: 1  # comment for a
    b:  # comment for b
      c: 2  # comment for c
    

    So stick with the new API as it is trivial to convert the output of a stream based API to a string yourself.

    Interpreting a stream parameter with a default value None as "return as a string" is IMO not the right thing to built in the (new) API. In the same way that e.g. Python's datatime.date() constructor creates an object and doesn't have an option to stream built in (so you can't do datetime.date(2023, 4, 18, stream=sys.stdout) ), even though I can image there is someone out there that might want to do that.

    (tested on macOS and Linux, but it should work on Windows)