Search code examples
pythonpicklepython-internalspython-3.7shelve

python shelve: same objects become different objects after reopening shelve


I am seeing this behavior using shelve:

import shelve

my_shelve = shelve.open('/tmp/shelve', writeback=True)
my_shelve['a'] = {'foo': 'bar'}
my_shelve['b'] = my_shelve['a']
id(my_shelve['a'])  # 140421814419392
id(my_shelve['b'])  # 140421814419392
my_shelve['a']['foo'] = 'Hello'
my_shelve['a']['foo']  # 'Hello'
my_shelve['b']['foo']  # 'Hello'
my_shelve.close()

my_shelve = shelve.open('/tmp/shelve', writeback=True)
id(my_shelve['a'])  # 140421774309128
id(my_shelve['b'])  # 140421774307832 -> This is weird.
my_shelve['a']['foo']  # 'Hello'
my_shelve['b']['foo']  # 'Hello'
my_shelve['a']['foo'] = 'foo'
my_shelve['a']['foo']  # 'foo'
my_shelve['b']['foo']  # 'Hello'
my_shelve.close()

As you can see when the shelve gets reopened the two objects that were previously the same object are now two different objects.

  1. Anybody knows what is happening here?
  2. Anybody knows how to avoid this behavior?

I am using Python 3.7.0


Solution

  • shelve stores pickled representations of objects to the shelf file. When you store the same object as my_shelf['a'] and my_shelf['b'], shelve writes a pickle of the object for the 'a' key, and another pickle of the object for the 'b' key. One key thing to note is that it pickles all values separately.

    When you reopen the shelf, shelve uses the pickled representations to reconstruct the objects. It uses the pickle for 'a' to reconstruct the dict you stored, and it uses the pickle for 'b' to reconstruct the dict you stored again.

    The pickles do not interact with each other and do not have any way to return the same object as each other when unpickled. There is no indication in the on-disk representation that my_shelf['a'] and my_shelf['b'] were ever the same object; a shelf produced using separate objects for my_shelf['a'] and my_shelf['b'] could look identical.


    If you want to preserve the fact that those objects were identical, you shouldn't store them in separate keys of a shelf. Consider pickling and unpickling a single dict with 'a' and 'b' keys instead of using shelve.