Search code examples
pythoninstancepickleshelve

Python instances stored in shelves change after closing it


I think the best way to explain the situation is with an example:

>>> class Person:
...     def __init__(self, brother=None):
...         self.brother = brother
... 
>>> bob = Person()
>>> alice = Person(brother=bob)
>>> import shelve
>>> db = shelve.open('main.db', writeback=True)
>>> db['bob'] = bob
>>> db['alice'] = alice
>>> db['bob'] is db['alice'].brother
True
>>> db['bob'] == db['alice'].brother
True
>>> db.close()
>>> db = shelve.open('main.db',writeback=True)
>>> db['bob'] is db['alice'].brother
False
>>> db['bob'] == db['alice'].brother
False

The expected output for both comparisons is True again. However, pickle (which is used by shelve) seems to be re-instantiating bob and alice.brother separately. How can I "fix" this using shelve/pickle? Is it possible for db['alice'].brother to point to db['bob'] or something similar? Notice I do not want only to compare both, I need both to actually be the same.

As suggested by Blckknght I tried pickling the entire dictionary at once, but the problem persists since it seems to pickle each key separately.


Solution

  • I believe that the issue you're seeing comes from the way the shelve module stores its values. Each value is pickled independently of the other values in the shelf, which means that if the same object is inserted as a value under multiple keys, the identity will not be preserved between the keys. However, if a single value has multiple references to the same object, the identity will be maintained within that single value.

    Here's an example:

    a = object() # an arbitrary object
    db = shelve.open("text.db")
    db['a'] = a
    db['another_a'] = a
    db['two_a_references'] = [a, a]
    db.close()
    
    db = shelve.open("text.db") # reopen the db
    print(db['a'] is db['another_a']) # prints False
    print(db['two_a_references'][0] is db['two_a_references'][1]) # prints True
    

    The first print tries to confirm the identity of two versions of the object a that were inserted in the database, one under the key 'a' directly, and another under 'another_a'. It doesn't work because the separate values are pickled separately, and so the identity between them was lost.

    The second print tests whether the two references to a that were stored under the key 'two_a_references' were maintained. Because the list was pickled in one go, the identity is kept.

    So to address your issue you have a few options. One approach is to avoid testing for identity and rely on an __eq__ method in your various object types to determine if two objects are semantically equal, even if they are not the same object. Another would be to bundle all your data into a single object (e.g. a dictionary) which you'd then save with pickle.dump and restore with pickle.load rather than using shelve (or you could adapt this recipe for a persistent dictionary, which is linked from the shelve docs, and does pretty much that).