I'm using pyyaml(Version: 5.1) and Python 2 to parse a YAML data body of an incoming POST API request.
The body of the incoming request contains some Unicode objects, along with some string objects.
The solution given in link is used to load the YAML mapping into an OrderedDict, where the stream refers to the incoming POST API request's YAML data body.
But, I have to use the OrderedDict generated from the link with some library that only accepts string objects.
I can't change the library nor update it and I've to use Python 2.
The current solution for this, which is being used is,
The sample code for the same is as below,
def convert(data):
if isinstance(data, unicode):
return data.encode('utf-8')
if isinstance(data, list):
return [convert(item) for item in data]
if isinstance(data, dict):
newData = {}
for key, value in data.iteritems():
newData[convert(key)] = convert(value)
return newData
return data
Although this works, the solution is not efficient, as the complete OrderedDict is parsed after it is being created.
Is there a way, where the conversion of the data can be done before or during the generation of the OrderedDict, to avoid parsing it again?
You can provide a custom constructor that will always load YAML !!str
scalars to Python unicode strings:
import yaml
from yaml.resolver import BaseResolver
def unicode_constructor(self, node):
# this will always return a unicode string;
# the default loader would convert it to ASCII-encoded str if possible.
return self.construct_scalar(node)
yaml.add_constructor(BaseResolver.DEFAULT_SCALAR_TAG, unicode_constructor)
Afterwards, yaml.load
will always return unicode strings.
(Code untested as I don't have a Python 2 installation)