Search code examples
pythonpython-3.xpickledill

Serialising a function object with attributes, one attribute missing when loading


I am using dill in python 3.7, but one of the attributes of a function is lost when I reload it latter.

I have class named session that I save on program exit, load at start. This object contains, indirectly, Tranform instances that have a a function attribute that references a specific function. That function has several attributes set on it.

When I use a debugger when I save the session, I can see that the specific attribute is present and set to None. But when I load a saved session, everything is fine except that this one attribute has disappeared.

Here is the saving code:

def save(self):
    print ('\n SAVING SESSION STATE, DO NOT EXIT')
    breakpoint()

    sessionDirectory='__PETL__'
    if not os.path.exists(sessionDirectory):
        os.makedirs(sessionDirectory)
    with open(sessionDirectory+'/'+self.name, 'wb') as f: 
        dill.dump(self,f)
    print ('\nSession Saved, exiting')

Here is the loading code:

def loadSession(self, sessionName):
    if (Session.dontLoad):
        print ('Creating New Session')
        return None
    try:
        with open('__PETL__/'+ sessionName, 'rb') as f:
            session=dill.load(f)
    except FileNotFoundError:
        print ('No session found, creating new one')
        return None

    return session

And here are the debugger outputs:

Saving:

> /home/osboxes/stage/inspireDataBase2/migrations/src/session/session.py(160)save()
-> sessionDirectory='__PETL__'
(Pdb) print( self.transforms[0].transform.function.queryRes)
None
(Pdb) print (dir(self.transforms[0].transform.function)[-9:])
['after', 'args', 'columns', 'fetch', 'indexs', 'query', 'queryRes', 'sameorderasafter', 'transformvar']
(Pdb) dill.dumps(self.transforms[0].transform.function)
b'\x80\x03cuserTransformModulePreparsed\ntransform__constru__buildinggeometry2d\nq\x00.'
(Pdb) c
Session Saved, exiting

Loading:

> /home/osboxes/stage/inspireDataBase2/migrations/src/session/session.py(39)__init__()
-> session.printJobDone()
(Pdb) print( self.transforms[0].transform.function.queryRes)
*** AttributeError: 'function' object has no attribute 'queryRes'
(Pdb) print( session.transforms[0].transform.function.queryRes)
*** AttributeError: 'function' object has no attribute 'queryRes'
(Pdb) print (dir(session.transforms[0].transform.function)[-9:])
['__subclasshook__', 'after', 'args', 'columns', 'fetch', 'indexs', 'query', 'sameorderasafter', 'transformvar']

As you see, the other attributes work as expected.

As the saving part is the last thing I do in my project, I guess I just don't understand how dill works. The attribute differs from the other because this one is set in another class (not in the same module as the function). The other attributes are set directly in the module of the function. This said, the module is obtained by compiling an AST Tree, but I don't see why it would be a problem.

And I see that indeed in the first output, there is only a reference to the module of the function in the dill output (but I am don't know how dill works, maybe this is normal).


Solution

  • dill does not capture function attributes, not for functions that can be imported directly. Any attributes you see when loading were added to that function object by other code, perhaps at import time.

    All that dill.dumps() stored, was enough information to just re-import the same function object; in your debugging session that's userTransformModulePreparsed.transform__constru__buildinggeometry2d. When loading that serialisation, all that needs to be done is import userTransformModulePreparsed and then using the transform__constru__buildinggeometry2d attribute of that module. Functions are deemed singletons in such cases, only one copy needs to exist per Python process. It is assumed that all loading of that object is otherwise handled by the normal import process. This includes attributes added to the function object!

    dill can handle generated function objects, that is, any function object that can't be imported directly, at which point it'll capture all aspects of the function including attributes. For example, using def inside of a function (nested functions) will always create a new, separate function objects each time you call the parent function. Serialising such objects is handled differently:

    >>> import dill
    >>> def foo():
    ...     def bar(): pass  # nested function
    ...     bar.spam = 'ham'
    ...     return bar
    ...
    >>> foo()
    <function foo.<locals>.bar at 0x110621e50>
    >>> foo() is not foo()  # new calls produce new function objects
    True
    >>> bar = foo()
    >>> vars(bar)   # the resulting function object has attributes
    {'spam': 'ham'}
    >>> bar_restored = dill.loads(dill.dumps(bar))
    >>> vars(bar_restored)  # the attributes are preserved by dill
    {'spam': 'ham'}
    >>> bar.extra = 'additional'
    >>> vars(dill.loads(dill.dumps(bar)))  # this extends to new attributes added later.
    {'spam': 'ham', 'extra': 'additional'}
    

    So you have two options here; either set the function attribute at import time, or generate the function in a nested function.