Search code examples
pythonyamlruamel.yamlmulti-document

How do I read/write markdown yaml frontmatter with ruamel.yaml?


I want to use Python to read and write YAML frontmatter in markdown files. I have come across the ruamel.yaml package but am having trouble understanding how to use it for this purpose.

If I have a markdown file:

---
car: 
  make: Toyota
  model: Camry
---

# My Ultimate Car Review
This is a good car.

For one, is there a way to set the yaml data to variables in my python code?

Second, is there a way to set new values to the yaml in the markdown file?

For the first, I have tried:

from ruamel.yaml import YAML
import sys

f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.

yaml = YAML()
code = yaml.load(f)
print(code['car']['make'])

but get an error:

ruamel.yaml.composer.ComposerError: expected a single document in the stream
  in "cars.txt", line 2, column 1
but found another document
  in "cars.txt", line 5, column 1

For the second, I have tried:

from ruamel.yaml import YAML
import sys

f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.

yaml = YAML()
code = yaml.load(f)
code['car']['model'] = 'Sequoia'

but get the same error error:

ruamel.yaml.composer.ComposerError: expected a single document in the stream
  in "cars.txt", line 2, column 1
but found another document
  in "cars.txt", line 5, column 1

Solution

  • When you have multiple YAML documents in one file these are separated with a line consisting of three dashes, or starting with three dashes followed by a space. Most YAML parsers, including ruamel.yaml either expect a single document file (when using YAML().load()) or a multi-document file (when using YAML().load_all()).

    The method .load() returns the single data structure, and complains if there seems to be more than one document (i.e. when it encounters the second --- in your file). The .load_all() method can handle one or more YAML documents, but always returns an iterator.

    Your input happens to be a valid multi-document YAML file but the markdown part often makes this not be the case. It easily could always have been valid YAML by just changing the second --- into --- | thereby making the markdown part a (multi-line) literal scalar string. I have no idea why the designers of such YAML frontmatter formats didn't specify that, it might have to do that some parsers (like PyYAML) fail to parse such non-indented literal scalar strings at the root level correctly, although examples of those are in the YAML specification.

    In your example the markdown part is so simple that it is valid YAML without having to specify the | for literal scalar string. So you could use .load_all() on this input. But just adding e.g. a line starting with a dash to the markdown section, will result in an invalid YAML document, so you if you use .load_all(), you have to make sure you do not iterate so far as to parse the second document:

    import sys
    from pathlib import Path
    import ruamel.yaml
    
    path = Path('cars.txt')
    
    yaml = ruamel.yaml.YAML()
    for data in yaml.load_all(path):
        break
    print(data['car']['make'])
    

    which gives:

    Toyota
    

    You shouldn't try to update the file however (so don't use r+), as your YAML frontmatter might be longer than the original and and updating would overwrite your markdown. For updating, read file into memory, split into two parts based on the second line of dashes, update the data, dump it and append the dashes and markdown:

    import sys
    from pathlib import Path
    import ruamel.yaml
    
    path = Path('cars.txt')
    opath = Path('cars_out.txt')
    yaml_str, markdown = path.read_text().lstrip().split('\n---', 1)
    yaml_str += '\n' # re-add the trailing newline that was split off
    
    yaml = ruamel.yaml.YAML()
    yaml.explicit_start = True
    data = yaml.load(yaml_str)
    
    data['car']['year'] = 2003
    
    with opath.open('w') as fp:
        yaml.dump(data, fp)
        fp.write('---')
        fp.write(markdown)
    
    sys.stdout.write(opath.read_text())
    

    which gives:

    ---
    car:
      make: Toyota
      model: Camry
      year: 2003
    ---
    
    # My Ultimate Car Review
    This is a good car.