TL;DR: how to use type(self)
in the decorator of a member function?
I would like to do serialization of derived classes and share some serialization logic in the base class in Python.
Since pickle
and simple yaml
did not seem to be able to deal with this reliably, I then stumbled over camel
which I consider a quite neat solution to the problem see this link.
Consider two extremely simplified classes B
and A
where B
is inheriting from A
. I want to be able to serialize B
in my main function like this:
from camel import Camel, CamelRegistry
serializable_types = CamelRegistry()
# ... define A and B with dump and load functions ...
if __name__ == "__main__":
serialization_interface = Camel([serializable_types])
b = B(x=3, y=4)
s = serialization_interface.dump(b)
print(s)
I came up with two solutions that work:
Version 1: the dumping and loading is done in stand-alone functions outside of the class. Problems: not very elegant, function dumpA
not automatically available to inheriting class in dumpB
, more cumbersome function naming, function scope bigger than necessary
# VERSION 1 - dump and load in external functions
class A:
def __init__(self, x):
self._x = x
@serializable_types.dumper(A, 'object_A', version=None)
def dumpA(a):
return {'x': a._x}
@serializable_types.loader('object_A', version=None)
def loadA(data, version):
return A(data.x)
class B(A):
def __init__(self, x, y):
super().__init__(x)
self._y = y
@serializable_types.dumper(B, 'object_B', version=None)
def dumpB(b):
b_data = dumpA(b)
b_data.update({'y': b._y})
return b_data
@serializable_types.loader('object_B', version=None)
def loadB(data, version):
return B(data.x)
Version 2: functions for loading and dumping are defined directly in the constructor. Function are still not available in the subclass :/
# VERSION 2 - dump and load functions defined in constructor
class A:
def __init__(self, x):
self._x = x
@serializable_types.dumper(A, 'object_A', version=None)
def dump(a):
a.to_dict()
@serializable_types.loader('object_A', version=None)
def load(data, version):
return A(data.x)
def to_dict(self):
return {'x': self._x}
class B(A):
def __init__(self, x, y):
super().__init__(x)
self._y = y
@serializable_types.dumper(B, 'object_B', version=None)
def dump(b):
b_data = b.to_dict()
return b_data
@serializable_types.loader('object_B', version=None)
def load(data, version):
return B(data.x)
def to_dict(self):
b_data = super().to_dict()
b_data.update({'y': b._y})
return b_data
I would like to achieve an implementation that looks like this:
# VERSION 3 - dump and load functions are member functions
# ERROR: name 'A' is not defined
class A:
def __init__(self, x):
self._x = x
@serializable_types.dumper(A, 'object_A', version=None)
def dump(a):
return {'x': a._x}
@serializable_types.loader('object_A', version=None)
def load(data, version):
return A(data.x)
class B(A):
def __init__(self, x, y):
super().__init__(x)
self._y = y
@serializable_types.dumper(B, 'object_B', version=None)
def dump(b):
b_data = super().dump(b)
b_data.update({'y': b._y})
return b_data
@serializable_types.loader('object_B', version=None)
def load(data, version):
return B(data.x)
This will not work cause in the definition of the dump
functions, A
and B
are not defined. From a software design perspective however, I consider this to be the cleanest solution with fewest lines of code.
Is there a way to get the type definitions of A
and B
to work in the decorator? Or has anyone solved the problem in a different way?
I came across this but couldn't see a straightforward way of applying it to my usecase.
Your version 3 is not going to work because, as you probably noticed, at the
time the decorator is called, A
is not defined yet.
If you would write your decorator
in the way before the @
syntactic sugar was added to Python:
def some_decorator(fun):
return fun
@some_decorator
def xyz():
pass
, that is:
def some_decorator(fun):
return fun
def xyz():
pass
some_decorator(xyz)
then that should be immediately clear.
Your version 2, defers the registration of your loader and dumper
routines until an instance of both A
and B
is created in some
otherway than loading before you can do loading. That could be working
if you created instances of both classes and then did dump, followed by load,
from within one program. But if you only create B
and want to dump
it, then the functions for A
have not registred and A.dump()
is
not available. And anyway if a program does both dump and load data,
it is much more common to do the loading from some persistent storage
first, and then do the dumping, and during loading the registration
would not yet have taken place. So you would need some extra
registration mechanism for all your classes and creation of at least
one instance for each of these classes. Probably not what you want.
In version 1, you cannot easily find dumpA
while in dumpB
,
although it should be possible to look into the internals of
serializable_types
and find the parent class of B
, this however is
non-trivial, ugly and there is a better way by minimizing dumpB
(and
dumpA
) into functions that return the value returned some method of B
(resp. A
), appropriately named dump
:
from camel import CamelRegistry, Camel
serializable_types = CamelRegistry()
# VERSION 1 - dump and load in external functions
class A:
def __init__(self, x):
self._x = x
def dump(self):
return {'x': self._x}
@serializable_types.dumper(A, 'object_A', version=None)
def dumpA(a):
return a.dump()
@serializable_types.loader('object_A', version=None)
def loadA(data, version):
return A(data.x)
class B(A):
def __init__(self, x, y):
super().__init__(x)
self._y = y
def dump(self):
b_data = A.dump(self)
b_data.update({'y': b._y})
return b_data
@serializable_types.dumper(B, 'object_B', version=None)
def dumpB(b):
return b.dump()
@serializable_types.loader('object_B', version=None)
def loadB(data, version):
return B(data.x)
if __name__ == "__main__":
serialization_interface = Camel([serializable_types])
b = B(x=3, y=4)
s = serialization_interface.dump(b)
print(s)
which gives:
!object_B
x: 3
y: 4
That works because by the time dumpB
is called, you have an instance of type B
(otherwise you could not get at its attributes), and the methods of
class B
know about class A
.
Please note that doing return B(data.x)
is not going to work in any of your versions
as B
's __init__
expects two parameters.
I find the above rather unreadable.
You indicate that "simple yaml
did not seem to be able to deal with
this reliably". I am not aware of why this would be true, but there is
a lot of misunderstanding about YAML¹
I recommend you take a look at ruamel.yaml
(disclaimer: I am the author of that package).
It requires registration of classes for dumping and loading, uses pre-defined method names
for loading and dumping (from_yaml
resp. to_yaml
), and the "registration office" calls
these methods including class information. So there is no need to defer the definition
of these methods until you construct an object as in your version 2.
You can either explicitly register a class or decorate the class as
soon as the decorator is available (i.e. once you have your YAML
instance). Since B
is inherting from A
, you only have to provide
to_yaml
and from_yaml
in A
and can re-use the dump
methods
from the previous example:
import sys
class A:
yaml_tag = u'!object_A'
def __init__(self, x):
self._x = x
@classmethod
def to_yaml(cls, representer, node):
return representer.represent_mapping(cls.yaml_tag, cls.dump(node))
@classmethod
def from_yaml(cls, constructor, node):
instance = cls.__new__(cls)
yield instance
state = ruamel.yaml.constructor.SafeConstructor.construct_mapping(
constructor, node, deep=True)
instance.__dict__.update(state)
def dump(self):
return {'x': self._x}
import ruamel.yaml # delayed import so A cannot be decorated
yaml = ruamel.yaml.YAML()
@yaml.register_class
class B(A):
yaml_tag = u'!object_B'
def __init__(self, x, y):
super().__init__(x)
self._y = y
def dump(self):
b_data = A.dump(self)
b_data.update({'y': b._y})
return b_data
yaml.register_class(A)
# B not registered, because it is already decorated
b = B(x=3, y=4)
yaml.dump(b, sys.stdout)
print('=' * 20)
b = yaml.load("""\
!object_B
x: 42
y: 196
""")
print('b.x: {.x}, b.y: {.y}'.format(b, b))
which gives:
!object_B
x: 3
y: 4
====================
b.x: 42, b.y: 196
The yield
in the above code is necessary to deal with instances that
have (indirect) circular references to themselves and for which,
obviously, not all arguments can be available at the time of object
creation.
¹ E.g. one YAML 1.2 reference
states
that a YAML document begins with ---
, where that is actually called
a
directives-end-marker
and not document-start-marker for good reasons. And that ...
, the
document-end-marker, can only be followed by directives or
---
,
whereas the spec clearly indcates that it can be followed by comments
and also by bare documents.