I'm having issues loading/dumping yaml files with PyYaml that need to be compatible with both Python 2 and Python 3.
For Python 3 dumping/ Python 2 loading, I found a solution:
import yaml
data = {"d": "😋"}
with open(file_path, "w") as f:
yaml.dump(data, f, allow_unicode=True)
This produces a yaml file with this line:
d: 😋
If I try to load this file with Python 2:
with open(file_path, "r") as f:
y = yaml.safe_load(f)
print(y["d"])
I get the following output:
😋
But now if I try to dump a file with Python 2, I tried:
data = {"d": u"😋"}
with open(file_name, "w") as f:
yaml.dump(f)
which produces a yaml file:
d: "\uD83D\uDE0B"
I also tried:
data = {"d": u"😋".encode("utf-8")}
with open(file_name, "w") as f:
yaml.dump(f)
which produces a yaml file:
d: !!python/str "\uD83D\uDE0B"
In both cases, if I load with Python 3:
with open(file_path, "r") as f:
y = yaml.load(f)
then y["d"]
is '\ud83d\ude0b'
which cannot be used as is.
I found out I could do something like
y["d"].encode("utf-16", "surrogatepass").decode("utf-16")
but that seems like an overkill.
So what's the solution for dumping a file with Python 2 that is readable and properly interpreted in Python 3?
I ended up adding a constructor for this.
I add it to a custom loader, so I do self.add_constructor
, but it's the same at the yaml level, easier to illustrate with that.
yaml.add_constructor("tag:yaml.org,2002:python/str", unicode_constructor)
def unicode_constructor(loader, node):
scalar = loader.construct_scalar(node)
return scalar.encode("utf-16", "surrogatepass").decode("utf-16")
This works for Python2 dump/ Python 3 load
and doesn't affect Python 3 dump/ Python 2 or 3 load