I was wondering whether there is an easy way to parse a YAML document consisting of a list of items as a python generator using PyYAML.
For example, given the file
# foobar.yaml
---
- foo: ["bar", "baz", "bah"]
something_else: blah
- bar: yet_another_thing
I'd like to be able to do something like
for item in yaml.load_as_generator(open('foobar.yaml')): # does not exist
print(str(item))
I know there is yaml.load_all, which can achieve similar functionality, but then you need to treat each record as its own document. The reason why I'm asking is because I have some really big files that I'd like to convert to YAML and then parse with a low memory footprint.
I took a look at the PyYAML Events API but it scared me =)
I can understand that the Events API scares you, and it would only bring you so much. First of all you would need to keep track of depth (because you have your top level complex sequence items, as well as "bar", "baz" etc. And, having cut the low level sequence event elements correctly you would have to feed them into the composer to create nodes (and eventually Python objects), not trivial either.
But since YAML uses indentation, even for scalars spanning multiple lines, you can use a simple line based parser that recognises where each sequence element starts and feed those into the normal load()
function one at a time:
#/usr/bin/env python
import ruamel.yaml
def list_elements(fp, depth=0):
buffer = None
in_header = True
list_element_match = ' ' * depth + '- '
for line in fp:
if line.startswith('---'):
in_header = False
continue
if in_header:
continue
if line.startswith(list_element_match):
if buffer is None:
buffer = line
continue
yield ruamel.yaml.load(buffer)[0]
buffer = line
continue
buffer += line
if buffer:
yield ruamel.yaml.load(buffer)[0]
with open("foobar.yaml") as fp:
for element in list_elements(fp):
print(str(element))
resulting in:
{'something_else': 'blah', 'foo': ['bar', 'baz', 'bah']}
{'bar': 'yet_another_thing'}
I used the enhanced version of PyYAML, ruamel.yaml here (of which I am the author), but PyYAML should work in the same way.