Here is my code:
import yaml
yaml.load('foo')
This code leads to the following warning with PyYAML (5.1).
$ pip install pyyaml
$ python3 foo.py
foo.py:2: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
yaml.load('foo')
So I visited https://msg.pyyaml.org/load to see what this is about but I do not understand the need for this warning.
First, the documentation says,
UnsafeLoader
(also calledLoader
for backwards compatability)The original Loader code that could be easily exploitable by untrusted data input.
Okay, that makes sense. In an earlier version, the original loader was unsafe. Further, it says,
FullLoader
Loads the full YAML language. Avoids arbitrary code execution. This is currently (PyYAML 5.1) the default loader called by
yaml.load(input)
(after issuing the warning).
So the current version uses FullLoader
which is not unsafe. This is confirmed again in the document.
The load function was also made much safer by disallowing the execution of arbitrary functions by the default loader (
FullLoader
).
If the current version that uses FullLoader
is not unsafe, then why do we need the YAMLLoadWarning
at all?
I think this warning is more like a notification & guidance to let the user know what is the PyYAML best practice in the future. Recall that: Explicit is better than implicit.
Before version 5.1 (e.g. 4.1), the yaml.load
api use Loader=Loader
as default:
def load(stream, Loader=Loader):
"""
Parse the first YAML document in a stream
and produce the corresponding Python object.
"""
loader = Loader(stream)
try:
return loader.get_single_data()
finally:
loader.dispose()
def safe_load(stream):
"""
Parse the first YAML document in a stream
and produce the corresponding Python object.
Resolve only basic YAML tags.
"""
return load(stream, SafeLoader)
At that time, there were only three available choice for Loader
class: limited BaseLoader
, SafeLoader
and the unsafe Loader
. Although the default one is unsafe, just like we read from the doc:
PyYAML's
load
function has been unsafe since the first release in May 2006. It has always been documented that way in bold type: PyYAMLDocumentation. PyYAML has always provided asafe_load
function that can load a subset of YAML without exploit.
But there are still a lot of resources and tutorials prefer using yaml.load(f)
directly, so the users (and especially the new user) are choosing a default Loader class implicitly.
And since PyYAML version 5.1, the yaml.load
api is changed to be more explicit:
def load(stream, Loader=None):
"""
Parse the first YAML document in a stream
and produce the corresponding Python object.
"""
if Loader is None:
load_warning('load')
Loader = FullLoader
loader = Loader(stream)
try:
return loader.get_single_data()
finally:
loader.dispose()
def safe_load(stream):
"""
Parse the first YAML document in a stream
and produce the corresponding Python object.
Resolve only basic YAML tags. This is known
to be safe for untrusted input.
"""
return load(stream, SafeLoader)
And a new FullLoader
is added into Loader
classes. As users, we should also be aware of the changes and use yaml.load
more explicitly:
yaml.load(stream, yaml.SafeLoader)
Recommended for untrusted input. Limitation: Loads a subset of the YAML language.
yaml.load(stream, yaml.FullLoader)
For more trusted input. Still a bit of Limitation: Avoids arbitrary code execution.
yaml.load(stream, yaml.Loader)
(UnsafeLoader
is the same as Loader
)
Unsafe. But has the full power.