In the process of using joblib to parallelize some model-fitting code involving Theano functions, I've stumbled across some behavior that seems odd to me.
Consider this very simplified example:
from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np
class TheanoModel(object):
def __init__(self):
X = te.dvector('X')
Y = (X ** te.log(X ** 2)).sum()
self.theano_get_Y = theano.function([X], Y)
def get_Y(self, x):
return self.theano_get_Y(x)
def run(niter=100):
x = np.random.randn(1000)
model = TheanoModel()
pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')
# this fails with `TypeError: can't pickle instancemethod objects`...
results = pool(delayed(model.get_Y)(x) for _ in xrange(niter))
# # ... but this works! Why?
# results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))
if __name__ == '__main__':
run()
I understand why the first case fails, since .get_Y()
is clearly an instancemethod of TheanoModel
. What I don't understand is why the second case works, since X
, Y
andtheano_get_Y()
are only declared within the __init__()
method of TheanoModel
. theano_get_Y()
can't be evaluated until the TheanoModel
instance has been created. Surely, then, it should also be considered an instancemethod, and should therefore be unpickleable? In fact, even still works if I explicitly declare X
and Y
to be attributes of the TheanoModel
instance.
Can anyone explain what's going on here?
Just to illustrate why I think this behaviour is particularly weird, here are a few examples of some other callable member objects that don't take self
as the first argument:
from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np
class TheanoModel(object):
def __init__(self):
X = te.dvector('X')
Y = (X ** te.log(X ** 2)).sum()
self.theano_get_Y = theano.function([X], Y)
def square(x):
return x ** 2
self.member_function = square
self.static_method = staticmethod(square)
self.lambda_function = lambda x: x ** 2
def run(niter=100):
x = np.random.randn(1000)
model = TheanoModel()
pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')
# # not allowed: `TypeError: can't pickle function objects`
# results = pool(delayed(model.member_function)(x) for _ in xrange(niter))
# # not allowed: `TypeError: can't pickle function objects`
# results = pool(delayed(model.lambda_function)(x) for _ in xrange(niter))
# # also not allowed: `TypeError: can't pickle staticmethod objects`
# results = pool(delayed(model.static_method)(x) for _ in xrange(niter))
# but this is totally fine!?
results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))
if __name__ == '__main__':
run()
None of them are pickleable with the exception of the theano.function
!
Theano functions aren't python functions. Instead they are python objects that override __call__
. This means that you can call them just like a function but internally they are really objects of some custom class. In consequence, you can pickle them.