I've been working with a the PyYAML
parser for a few months now to convert file types as part of a data pipeline. I've found the parser to be quite idiosyncratic at times and it seems that today I've stumbled on another strange behavior. The file I'm currently converting contains the following section:
off:
yes: "Flavor text for yes"
no: "Flavor text for no"
I keep a list of the current nesting in the dictionary so that I can construct a flat document, but save the nesting to convert back to YAML later on. I got a TypeError
saying I was trying to concatenate a str
and bool
type together. I investigated and found that PyYaml
is actually taking my section of text above and converting it to the following:
with open(filename, "r") as f:
data = yaml.load(f.read())
print data
>> {False: {True: "Flavor text for yes", False: "Flavor text for no}}
I did a quick check and found that PyYAML
was doing this for yes
, no
, true
, false
, on
, off
. It only does this conversion if the keys are unquoted. Quoted values and keys will be passed fine. Looking for solutions, I found this behavior documented here.
Although it might be helpful to others to know that quoting the keys will stop PyYAML
from doing this, I don't have this option as I am not the author of these files and have written my code to touch the data as little as possible.
Is there a workaround for this issue or a way to override the default conversion behavior in PyYAML
?
yaml.load
takes a second argument, a loader class (by default, yaml.loader.Loader
). The predefined loader is a mash up of a number of others:
class Loader(Reader, Scanner, Parser, Composer, Constructor, Resolver):
def __init__(self, stream):
Reader.__init__(self, stream)
Scanner.__init__(self)
Parser.__init__(self)
Composer.__init__(self)
Constructor.__init__(self)
Resolver.__init__(self)
The Constructor
class is the one mapping the data types to Python. One (kludgy, but fast) way to override the boolean conversion could be:
from yaml.constructor import Constructor
def add_bool(self, node):
return self.construct_scalar(node)
Constructor.add_constructor(u'tag:yaml.org,2002:bool', add_bool)
which overrides the function that the constructor uses to turn boolean-tagged data into Python booleans. What we're doing here is just returning the string, verbatim.
This affects ALL YAML loading, though, because you're overriding the behaviour of the default constructor. A more proper way to do things could be to create a new class derived from Constructor
, and new Loader
object taking your custom constructor.