Search code examples
pythonyamlpyyaml

Is it possible to use PyYAML to read a text file written with a "YAML front matter" block inside?


I'm sorry, I know very little of both YAML and PyYAML but I felt in love with the idea of supporting a configuration file written in the same style used by "Jekyll" (http://jekyllrb.com/docs/frontmatter/) that AFAIK have these "YAML Front Matter" blocks that looks very cool and sexy to me.
So I installed PyYAML on my computer and I wrote a small file with this block of text:

---
First Name: John
Second Name: Doe
Born: Yes
---

Lorem ipsum dolor sit amet, consectetur adipiscing elit,  
sed do eiusmod tempor incididunt ut labore et dolore magna  
aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip ex ea commodo consequat.

Then I tried to read this text file with Python 3.4 and PyYAML by using this code:

import yaml

stream = open("test.yaml")
a = stream.read()
b = yaml.load(a)

But obviously it's not working, and Python displays this error message:

Traceback (most recent call last):
  File "<pyshell#62>", line 1, in <module>
    b = yaml.load(a)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/__init__.py", line 72, in load
    return loader.get_single_data()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/constructor.py", line 35, in get_single_data
    node = self.get_single_node()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 43, in get_single_node
    event.start_mark)
yaml.composer.ComposerError: expected a single document in the stream
  in "<unicode string>", line 2, column 1:
    First Name: John
    ^
but found another document
  in "<unicode string>", line 5, column 1:
    ---
    ^

Could you help me, please?
Have I wrote the code in the wrong way, or does this means that PyYAML can't handle YAML front matter blocks?
Is there anything else I could try to do with PyYAML, or do I have to write my own parser by using regex ?

Thank you very much for your time !


Solution

  • The Python yaml library does not support reading yaml that is embedded in a document. Here is a utility function that extracts the yaml text, so you can parse it before reading the remainder of the file:

    #!/usr/bin/python2.7
    
    import yaml
    import sys
    
    def get_yaml(f):
      pointer = f.tell()
      if f.readline() != '---\n':
        f.seek(pointer)
        return ''
      readline = iter(f.readline, '')
      readline = iter(readline.next, '---\n')
      return ''.join(readline)
    
    
    for filename in sys.argv[1:]:
      with open(filename) as f:
        config = yaml.load(get_yaml(f))
        text = f.read()
        print "TEXT from", filename
        print text
        print "CONFIG from", filename
        print config