Search code examples
pythondatabasepython-3.xpython-3.6zodb

ZODB python: how to avoid creating a database with only one big entry?


I'm using the ZODB python database module for the first time. The tutorial (http://www.zodb.org/en/latest/tutorial.html) confuses me about a certain aspect of the ZODB database behavior: how can I avoid to accidentally create a database with just a single very big entry? I'll explain step-by-step my application, my current database approach, and where the confusion comes from.

 

1. The Item-object

The database I want to save consists entirely of Item-objects, defined as follows (a little simplified):

class Item(Persistent):
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # 1. Persistent variables
        # ------------------------
        self.__name = name
        self.__myList = PersistentList()   # <- list can hold other Item-objects

        self.__myVar01 = None
        self.__myVar02 = None
        self.__myVar03 = None

        # 2. Non-persistent variables
        # ----------------------------
        self._v_myVar01 = None
        self._v_myVar02 = None
        self._v_myVar03 = None

Visually represented as follows:

enter image description here

 
The application constructs one such Item-object at startup. During the application, this Item-object will create 'children' (which are also Item-objects themselves). This process goes on for a while, such that the following object-structure is in memory:

enter image description here

 
This construct can easily consist of 20.000 Item-objects. So I want to save them in a database.

 

2. How I save this structure to a ZODB-database

To save this structure of Item-objects in a database, I follow the following guidelines from the tutorial:

Storing objects
To store an object in the ZODB we simply attach it to any other object that already lives in the database. Hence, the root object functions as a boot-strapping point. The root object is meant to serve as a namespace for top-level objects in your database.
[ Quoted from ZODB Tutorial   http://www.zodb.org/en/latest/tutorial.html ]

The following functions create a new database (starting from the top-level item) and save it to the harddrive:

from ZODB.FileStorage import FileStorage
from ZODB import DB
from persistent import Persistent
import transaction

# Call this function to save the database
# to the harddrive and close it.
# ----------------------------------------
def save_database_and_close():
    transaction.commit()
    conn.close()
    db.close()

# Call this function to create a new
# database, starting from a root-item
# ------------------------------------
def create_database(root_item):
    storage = FileStorage("C:/mytest/mydb.db")
    db = DB(storage)
    conn = db.open()
    root = conn.root()
    root.myRootItem = root_item
    transaction.commit()

 

3. The problem with storing everything on the root

However - when going on reading the tutorial - I get the impression that my current approach is not very good:

(Please note that at this point, the tutorial has covered an example of making Account-objects to be stored in a ZODB database)

We could store Account-objects directly on the root object:

import account

# Probably a bad idea:
root.account1 = account.Account()

But if you’re going to store many objects, you’ll want to use a collection object 3:

import account, BTrees.OOBTree

root.accounts = BTrees.OOBTree.BTree()
root.accounts['account-1'] = Account()

Footnote 3:
The root object is a fairy simple persistent object that’s stored in a single database record. If you stored many objects in it, its database record would become very large, causing updates to be inefficient and causing memory to be used ineffeciently.

The footnote implies that I'm just creating a database with one big entry - which is of course the most inefficient sort of database you can possibly imagine.

 

4. What I am confused about

Okay, I'm pretty sure that the following approach is very bad (and condemned by the warning above):

enter image description here

 
But does this warning (not to store everything on the root) also apply on my case? Like this:

enter image description here

In other words, would my approach create a database with a single large entry (very inefficient)? Or would it create a nice database with one entry per Item-object?


Note:
I'm not sure if it's relevant, but I'll list my system specs here:

  • Windows 10, 64-bit
  • Python 3.6.3
  • ZODB 5.4.0 (is latest version as of now - 21 May 2018)

Solution

  • No, this is not inefficient. The ZODB creates separate entries for each Persistent instance. Those entries are later loaded on demand as you access them.

    From the very same tutorial you linked to:

    Subclassing Persistent provides a number of features:

    [...]

    • Data will be saved in its own database record.

    This is entirely transparent to your application. Your first transaction will be large, but subsequent transactions will only write out the changes to individual items as you change them.