I'm working with one script that dumps a pandas series to a yaml file:
with open('ex.py','w') as f:
yaml.dump(a_series,f)
And then another script that opens the yaml file for the pandas series:
with open('ex.py','r') as f:
yaml.safe_load(a_series,f)
I'm trying to safe_load the series but I get a constructor error. How can I specify that the pandas series is safe to load?
When you use PyYAML's load
, you specify that everything in the YAML document you are loading is safe. That is why you need to use yaml.safe_load
.
In your case this leads to an error, because safe_load
doesn't know how to construct pandas internals that have tags in the YAML document like:
!!python/name:pandas.core.indexes.base.Index
and
!!python/tuple
etc.
You would need to provide constructors for all the objects, add these to the SafeLoader
and then do a_series = yaml.load(f)
.
Doing that can be a lot of work, especially since what looks like a small change to the data used in your series might require you to add constructors.
You could dump the dict representation of your Series
and load that back. Of course some information is lost in this process, I am not sure if that is acceptable:
import sys
import yaml
from pandas import Series
def series_representer(dumper, data):
return dumper.represent_mapping(u'!pandas.series', data.to_dict())
yaml.add_representer(Series, series_representer, Dumper=yaml.SafeDumper)
def series_constructor(loader, node):
d = loader.construct_mapping(node)
return Series(data)
yaml.add_constructor(u'!pandas.series', series_constructor, Loader=yaml.SafeLoader)
data = Series([1,2,3,4,5], index=['a', 'b', 'c', 'd', 'e'])
with open('ex.yaml', 'w') as f:
yaml.safe_dump(data, f)
with open('ex.yaml') as f:
s = yaml.safe_load(f)
print(s)
print(type(s))
which gives:
a 1
b 2
c 3
d 4
e 5
dtype: int64
<class 'pandas.core.series.Series'>
And the ex.yaml
file contains:
!pandas.series {a: 1, b: 2, c: 3, d: 4, e: 5}
There are a few things to note:
YAML documents are normally written to files with a .yaml
extension. Using .py
is bound to get you confused, or have you overwrite some program source files at some point.
yaml.load()
and yaml.safe_load()
take a stream as first paramater you use them like:
data = yaml.safe_load(stream)
and not like:
yaml.safe_load(data, stream)
It would be better to have a two step constructor, which allows you to construct self referential data structures. However Series.append()
doesn't seem to work for that:
def series_constructor(loader, node):
d = Series()
yield d
d.append(Series(loader.construct_mapping(node)))
If dumping the Series
via a dictionary is not good enough (because it simplifies the series' data), and if you don't care about the readability of the YAML generated, you can instead of .to_dict()
use to to_pickle()
but you would have to work with temporary files, as that method is not flexible enough to handle file like objects and expects a file name string as argument.