Search code examples
pythonindexingconsistencyzodb

Consistent indexing for objects with variable attributes in ZODB


I have a ZODB installation where I have to organize several million objects of about a handful of different types. I have a generic container class Table, which contains BTrees to index objects by attributes or combinations of these attributes. Data consistency is quite essential, and so I want to enforce, that the indices are automatically updated, when I write to any of the attributes, which are covered by the indexing. So a simple obj.a = x should be sufficient to calculate all new dependent index entries, check if there are any collisions, and finally write the indices and the value.

In general, I'd be happy to use a library for that, so I was looking at repoze.catalog and IndexedCatalog, but was not really happy with that. IndexedCatalog seems dead for quite a while, and not providing the kind of consistency for changes to the objects. repoze.catalog seems to be more used and active, but also not providing this kind of consistency, as far as I understand. If I missed something here, I'd love to hear about it and prefer reusing over reinventing.

So, how I see it besides trying to find a library for the problem, I'd have to intercept the write access to the dataobject attributes with descriptors and let the Table class do the magic of changing the indices. For that, the descriptor instances have to know, with which Table instances they have to talk with. The current implementation goes someting like that:

class DatabaseElement(Persistent):
    name = Property(constant_parameters)
    ...

class Property(object):
    ...
    def __set__(self, obj, name, val):
        val = self.check_value(val)
        setattr(obj, '_' + name, val)

When these DatabaseElement classes are generated, the database and the objects within are not yet created. So as mentioned in this nice answer, I'd probably have to create some singleton lookup mechanism, to find the Table objects, without handing them to Property as an instantiation argument. Is there a more elegant way? Persisting the descriptors itself? Any suggestions and best-practice examples welcome!


Solution

  • So I finally figured out myself. The solution comes in three parts. No ugly Singleton required. Table provides the logic to check for collisions, DatabaseElement gets the ability to lookup the responsible Table without ugly workarounds and Property takes care, that the indices are updated, before any indexed values are written. Here some snippets, the main clue is the table lookup of DatabaseElement. I also didn't see that documented anywhere. Nice extra: It not only verifies writes to single values, I can also check for changes of several indexed values in one go.

    class Table(PersistentMapping):
        ....
        def update_indices(self, inst, updated_values_dict):
            changed_indices_keys = self._check_collision(inst, updated_values_dict)
            original_keys = [inst.key(index) for index, tmp_key in changed_indices_keys]
            for (index, tmp_key), key in zip(changed_indices_keys, original_keys):
                self[index][tmp_key] = inst
                try:
                    del self[index][key]
                except KeyError:
                    pass
    
    
    class DatabaseElement(Persistent):
        ....
        @property
        def _table(self):
            return self._p_jar and self._p_jar.root()[self.__class__.__name__]
    
        def _update_indices(self, update_dict, verify=True):
            if verify:
                update_dict = dict((key, getattr(type(self), key).verify(val)) 
                                    for key, val in update_dict.items()
                                    if key in self._key_properties)
            if not update_dict:
                return
            table = self._table
            table and table.update_indices(self, update_dict)
    
    
    class Property(object):
        ....
        def __set__(self, obj, val):
            validated_val = self.validator(obj, self.name, val)
            if self.indexed:
                obj._update_indices({self.name: val}, verify=False)
            setattr(obj, self.hidden_name, validated_val)