It does not appear that the Python library pyyaml
will allow me to read a multi-document YAML stream and continue past the point of an parsing error. I have two related questions:
Here is an example of a multiple-document YAML stream:
%YAML 1.1
---
# YAML can contain comments like this
name: David
age: 55
---
name: Mei
age: 50 # Including end-of-line
---
name: Juana: ERROR
age: 47
...
---
name: Adebayo
age: 58
...
I would like code similar to this to skip the bad document, but figure out "no matter how bad this document is, something new starts after the ...
and ---
.
with open('data/multidoc-bad.yaml') as stream:
docs = yaml.load_all(stream)
while True:
try:
doc = next(docs)
print(doc)
except StopIteration:
break
except Exception as err:
print(err)
I'd like to get:
{'name': 'David', 'age': 55}
{'name': 'Mei', 'age': 50}
mapping values are not allowed here
in "data/multidoc-bad.yaml", line 10, column 12
{'name': 'Adebayo', 'age': 58}
But in reality I do not get that last line for "Adebayo."
I recognize that I could write a small parser myself that reads lines and only looks for ...
and ---
lines to chunk the stream. Then pass only single documents to yaml.loads()
after my own parsing. But it sure seems like that's what a parser is supposed to do for me.
Am I just missing something, and some other API will support this?
No, PyYAML cannot do this.
Do parsers in other programming languages support this operation? (if so, which)
None that I know of. Most YAML parsers are hand-written with quite some being translations from PyYAML. I don't know a single one that implements error recovery. (I worked with SnakeYAML, go-yaml, PyYAML, libyaml, YamlDotNet, and authored NimYAML and AdaYaml.)
But it sure seems like that's what a parser is supposed to do for me.
I think the reasons why parsers don't support this include