I have a ZODB installation where I have to organize several million objects of about a handful of different types. I have a generic container class Table
, which contains BTrees to index objects by attributes or combinations of these attributes. Data consistency is quite essential, and so I want to enforce, that the indices are automatically updated, when I write to any of the attributes, which are covered by the indexing. So a simple obj.a = x
should be sufficient to calculate all new dependent index entries, check if there are any collisions, and finally write the indices and the value.
In general, I'd be happy to use a library for that, so I was looking at repoze.catalog and IndexedCatalog, but was not really happy with that. IndexedCatalog seems dead for quite a while, and not providing the kind of consistency for changes to the objects. repoze.catalog seems to be more used and active, but also not providing this kind of consistency, as far as I understand. If I missed something here, I'd love to hear about it and prefer reusing over reinventing.
So, how I see it besides trying to find a library for the problem, I'd have to intercept the write access to the dataobject attributes with descriptors and let the Table
class do the magic of changing the indices. For that, the descriptor instances have to know, with which Table
instances they have to talk with. The current implementation goes someting like that:
class DatabaseElement(Persistent):
name = Property(constant_parameters)
...
class Property(object):
...
def __set__(self, obj, name, val):
val = self.check_value(val)
setattr(obj, '_' + name, val)
When these DatabaseElement
classes are generated, the database and the objects within are not yet created. So as mentioned in this nice answer, I'd probably have to create some singleton lookup mechanism, to find the Table
objects, without handing them to Property
as an instantiation argument. Is there a more elegant way? Persisting the descriptors itself? Any suggestions and best-practice examples welcome!
So I finally figured out myself. The solution comes in three parts. No ugly Singleton required. Table
provides the logic to check for collisions, DatabaseElement
gets the ability to lookup the responsible Table
without ugly workarounds and Property
takes care, that the indices are updated, before any indexed values are written. Here some snippets, the main clue is the table lookup of DatabaseElement
. I also didn't see that documented anywhere. Nice extra: It not only verifies writes to single values, I can also check for changes of several indexed values in one go.
class Table(PersistentMapping):
....
def update_indices(self, inst, updated_values_dict):
changed_indices_keys = self._check_collision(inst, updated_values_dict)
original_keys = [inst.key(index) for index, tmp_key in changed_indices_keys]
for (index, tmp_key), key in zip(changed_indices_keys, original_keys):
self[index][tmp_key] = inst
try:
del self[index][key]
except KeyError:
pass
class DatabaseElement(Persistent):
....
@property
def _table(self):
return self._p_jar and self._p_jar.root()[self.__class__.__name__]
def _update_indices(self, update_dict, verify=True):
if verify:
update_dict = dict((key, getattr(type(self), key).verify(val))
for key, val in update_dict.items()
if key in self._key_properties)
if not update_dict:
return
table = self._table
table and table.update_indices(self, update_dict)
class Property(object):
....
def __set__(self, obj, val):
validated_val = self.validator(obj, self.name, val)
if self.indexed:
obj._update_indices({self.name: val}, verify=False)
setattr(obj, self.hidden_name, validated_val)