Search code examples
pythonoopnumpypickletheano

Why am I allowed pickle instancemethods that are Theano functions, but not normal instancemethods?


In the process of using joblib to parallelize some model-fitting code involving Theano functions, I've stumbled across some behavior that seems odd to me.

Consider this very simplified example:

from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np

class TheanoModel(object):
    def __init__(self):
        X = te.dvector('X')
        Y = (X ** te.log(X ** 2)).sum()
        self.theano_get_Y = theano.function([X], Y)

    def get_Y(self, x):
        return self.theano_get_Y(x)

def run(niter=100):
    x = np.random.randn(1000)
    model = TheanoModel()
    pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

    # this fails with `TypeError: can't pickle instancemethod objects`...
    results = pool(delayed(model.get_Y)(x) for _ in xrange(niter))

    # # ... but this works! Why?
    # results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))

if __name__ == '__main__':
    run()

I understand why the first case fails, since .get_Y() is clearly an instancemethod of TheanoModel. What I don't understand is why the second case works, since X, Y andtheano_get_Y() are only declared within the __init__() method of TheanoModel. theano_get_Y() can't be evaluated until the TheanoModel instance has been created. Surely, then, it should also be considered an instancemethod, and should therefore be unpickleable? In fact, even still works if I explicitly declare X and Y to be attributes of the TheanoModel instance.

Can anyone explain what's going on here?


Update

Just to illustrate why I think this behaviour is particularly weird, here are a few examples of some other callable member objects that don't take self as the first argument:

from joblib import Parallel, delayed
import theano
from theano import tensor as te
import numpy as np

class TheanoModel(object):
    def __init__(self):
        X = te.dvector('X')
        Y = (X ** te.log(X ** 2)).sum()
        self.theano_get_Y = theano.function([X], Y)
        def square(x):
            return x ** 2
        self.member_function = square
        self.static_method = staticmethod(square)
        self.lambda_function = lambda x: x ** 2

def run(niter=100):
    x = np.random.randn(1000)
    model = TheanoModel()
    pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

    # # not allowed: `TypeError: can't pickle function objects`
    # results = pool(delayed(model.member_function)(x) for _ in xrange(niter))

    # # not allowed: `TypeError: can't pickle function objects`
    # results = pool(delayed(model.lambda_function)(x) for _ in xrange(niter))

    # # also not allowed: `TypeError: can't pickle staticmethod objects`
    # results = pool(delayed(model.static_method)(x) for _ in xrange(niter))

    # but this is totally fine!?
    results = pool(delayed(model.theano_get_Y)(x) for _ in xrange(niter))

if __name__ == '__main__':
    run()

None of them are pickleable with the exception of the theano.function!


Solution

  • Theano functions aren't python functions. Instead they are python objects that override __call__. This means that you can call them just like a function but internally they are really objects of some custom class. In consequence, you can pickle them.