I think the best way to explain the situation is with an example:
>>> class Person:
... def __init__(self, brother=None):
... self.brother = brother
...
>>> bob = Person()
>>> alice = Person(brother=bob)
>>> import shelve
>>> db = shelve.open('main.db', writeback=True)
>>> db['bob'] = bob
>>> db['alice'] = alice
>>> db['bob'] is db['alice'].brother
True
>>> db['bob'] == db['alice'].brother
True
>>> db.close()
>>> db = shelve.open('main.db',writeback=True)
>>> db['bob'] is db['alice'].brother
False
>>> db['bob'] == db['alice'].brother
False
The expected output for both comparisons is True
again. However, pickle
(which is used by shelve
) seems to be re-instantiating bob
and alice.brother
separately. How can I "fix" this using shelve
/pickle
? Is it possible for db['alice'].brother
to point to db['bob']
or something similar? Notice I do not want only to compare both, I need both to actually be the same.
As suggested by Blckknght I tried pickling the entire dictionary at once, but the problem persists since it seems to pickle each key separately.
I believe that the issue you're seeing comes from the way the shelve
module stores its values. Each value is pickled independently of the other values in the shelf, which means that if the same object is inserted as a value under multiple keys, the identity will not be preserved between the keys. However, if a single value has multiple references to the same object, the identity will be maintained within that single value.
Here's an example:
a = object() # an arbitrary object
db = shelve.open("text.db")
db['a'] = a
db['another_a'] = a
db['two_a_references'] = [a, a]
db.close()
db = shelve.open("text.db") # reopen the db
print(db['a'] is db['another_a']) # prints False
print(db['two_a_references'][0] is db['two_a_references'][1]) # prints True
The first print tries to confirm the identity of two versions of the object a
that were inserted in the database, one under the key 'a'
directly, and another under 'another_a'
. It doesn't work because the separate values are pickled separately, and so the identity between them was lost.
The second print tests whether the two references to a
that were stored under the key 'two_a_references'
were maintained. Because the list was pickled in one go, the identity is kept.
So to address your issue you have a few options. One approach is to avoid testing for identity and rely on an __eq__
method in your various object types to determine if two objects are semantically equal, even if they are not the same object. Another would be to bundle all your data into a single object (e.g. a dictionary) which you'd then save with pickle.dump
and restore with pickle.load
rather than using shelve
(or you could adapt this recipe for a persistent dictionary, which is linked from the shelve
docs, and does pretty much that).