I use ZODB coupled with BTree
s to store a large amount of data (millions of keys). I'd like to get the exact number of entries in my root dictionary (which is a BTree
). As I noticed, len()
called on the result of .keys()
takes a very long time (tens of minutes at least, honestly I've never waited for it to end when data set grew larger).
import ZODB
from BTrees.OOBTree import BTree
connection = ZODB.connection('database.fs')
dbroot = connection.root()
if not hasattr(dbroot, 'dictionary'):
dbroot.dictionary = BTree()
# much data is added and transactions are commited
number_of_items = len(dbroot.dictionary.keys()) # takes very long time
I pack the DB regularly.
I don't think it's relevant to the question, but dbroot.dictionary
contains other BTree
s inside as values.
You are calling the .keys()
method which must load and produce a full list of all the keys. That takes a lot of time.
You could ask the length of the BTree itself:
number_of_items = len(dbroot.dictionary)
This still needs to load all the buckets themselves (blocks of keys) to ask each for its length, so this still has to load a lot of data, just not produce the list.
We've always avoided trying to get a direct length; the Btree.Length
object is better suited for keeping track of a length 'manually'. The object is fully ZODB conflict-resolving. Each time you add elements to dbroot.dictionary
, add a count to the BTree.Length
object and have it keep count:
from BTrees.OOBTree import BTree
from BTrees.Length import Length
if not hasattr(dbroot, 'dictionary'):
dbroot.dictionary = BTree()
dbroot.dict_length = Length()
# add objects into the dictionary? Add to the length as well:
for i in range(count):
dbroot.dictionary[keys[i]] = value[i]
dbroot.dict_length.change(count)
then read out the length by calling the object:
length = dbroot.dict_length()