Search code examples
yamlblogsplaintextpyyaml

Separate YAML and plain text on the same document


While building a blog using django I realized that it would be extremely practical to store the text of an article and all the related informations (title, author, etc...) together in a human-readable file format, and then charge those files on the database using a simple script.

Now that said, YAML caught my attention for his readability and ease of use, the only downside of the YAML syntax is the indentation:

---
title: Title of the article
author: Somebody
# Other stuffs here ...
text:| 
    This is the text of the article. I can write whatever I want
    but I need to be careful with the indentation...and this is a
    bit boring.
---

I believe that's not the best solution (especially if the files are going to be written by casual users). A format like this one could be much better

---
title: Title of the article
author: Somebody
# Other stuffs here ...
---
Here there is the text of the article, it is not valid YAML but
just plain text. Here I could put **Markdown** or <html>...or whatever
I want...

Is there any solution? Preferably using python. Other file formats propositions are welcome as well!


Solution

  • Unfortunately this is not possible, what one would think could work is using | for a single scalar in the separate document:

    import ruamel.yaml
    
    yaml_str = """\
    title: Title of the article
    author: Somebody
    ---
    |
    Here there is the text of the article, it is not valid YAML but
    just plain text. Here I could put **Markdown** or <html>...or whatever
    I want...
    """
    
    for d in ruamel.yaml.load_all(yaml_str):
        print(d)
        print('-----')
    

    but it doesn't because | is the block indentation indicator. And although at the top level an indentation of 0 (zero) would easily work, ruamel.yaml (and PyYAML) don't allow this.

    It is however easy to parse this yourself, which has the advantage over using the front matter package that you can use YAML 1.2 and are not restricted to using YAML 1.1 because of frontmaker using the PyYAML. Also note that I used the more appropriate end of document marker ... to separate YAML from the markdown:

    import ruamel.yaml
    
    combined_str = """\
    title: Title of the article
    author: Somebody
    ...
    Here there is the text of the article, it is not valid YAML but
    just plain text. Here I could put **Markdown** or <html>...or whatever
    I want...
    """
    
    with open('test.yaml', 'w') as fp:
        fp.write(combined_str)
    
    
    data = None
    lines = []
    yaml_str = ""
    with open('test.yaml') as fp:
        for line in fp:
            if data is not None:
                lines.append(line)
                continue
            if line == '...\n':
                data = ruamel.yaml.round_trip_load(yaml_str)
                continue
            yaml_str += line
    
    print(data['author'])
    print(lines[2])
    

    which gives:

    Somebody
    I want...
    

    (the round_trip_load allows dumping with preservation of comments, anchor names etc).