Search code examples
pythonzodb

Counting len() of BTree saved in ZODB takes much time


I use ZODB coupled with BTrees to store a large amount of data (millions of keys). I'd like to get the exact number of entries in my root dictionary (which is a BTree). As I noticed, len() called on the result of .keys() takes a very long time (tens of minutes at least, honestly I've never waited for it to end when data set grew larger).

import ZODB
from BTrees.OOBTree import BTree

connection = ZODB.connection('database.fs')
dbroot = connection.root()

if not hasattr(dbroot, 'dictionary'):
    dbroot.dictionary = BTree()

# much data is added and transactions are commited

number_of_items = len(dbroot.dictionary.keys()) # takes very long time

I pack the DB regularly.

I don't think it's relevant to the question, but dbroot.dictionary contains other BTrees inside as values.


Solution

  • You are calling the .keys() method which must load and produce a full list of all the keys. That takes a lot of time.

    You could ask the length of the BTree itself:

    number_of_items = len(dbroot.dictionary)
    

    This still needs to load all the buckets themselves (blocks of keys) to ask each for its length, so this still has to load a lot of data, just not produce the list.

    We've always avoided trying to get a direct length; the Btree.Length object is better suited for keeping track of a length 'manually'. The object is fully ZODB conflict-resolving. Each time you add elements to dbroot.dictionary, add a count to the BTree.Length object and have it keep count:

    from BTrees.OOBTree import BTree
    from BTrees.Length import Length
    
    if not hasattr(dbroot, 'dictionary'):
        dbroot.dictionary = BTree()
        dbroot.dict_length = Length()
    
    # add objects into the dictionary? Add to the length as well:
    for i in range(count):
        dbroot.dictionary[keys[i]] = value[i]
    dbroot.dict_length.change(count)
    

    then read out the length by calling the object:

    length = dbroot.dict_length()