I would like to perform a search on the zope catalog of the objects with missing index key values. Is it possible?
For example consider the subsequent code lines:
from Products.CMFCore.utils import getToolByName
catalog = getToolByName(context, 'portal_catalog')
results = catalog.searchResults({'portal_type': 'Event', 'review_state': 'pending'})
what to do if I'm interested in objects in which a certain item, instead of portal_type or review_state, has not be inserted?
You can search for both types, but to search for MissingValue
entries requires custom handling of the internal catalog data structures.
Indexes take the value from an object, and index that. If there is an AttributeError
or similar, the index does not store anything for that object, and if the same field is part of the returned columns, in that case a MissingValue
will be given to indicate the index is empty for that field.
In the following examples I assume you have a variable catalog
that points to the site's portal_catalog tool; e.g. the result of getToolByName(context, 'portal_catalog')
or similar.
You can search for None in many indexes just fine:
catalog(myKeywordIndex=None)
The problem is that most indexe types ignore None
as a value. Thus, searching for None
will fail on Date and Path indexes; they ignore None on index, and Boolean indexes; they turn None into False when indexing.
Keyword indexes ignore None
as well, unless it is part of a sequence. If the indexed method returns [None]
it'll happily be indexed, but None
on it's own won't be.
Field indexes do store None
in the index.
Note that each index can show unique values, so you can check if there are None
values stored for a given index by calling:
catalog.uniqueValuesFor(indexname)
This is a little trickier. Each index does keep track of what objects it has indexed, to be able to remove data from the index when the object is removed, for example. At the same time, the catalog keeps track of what objects it has indexed as a whole.
Thus, we can calculate the difference between these two sets of information. This is what the catalog does all the time when you call the published APIs, but for this trick there is no such public API. We'll need to reach into the catalog internals and grab these sets for ourselves.
Luckily, these are all BTree sets, and the operations are thus relatively efficient. Here is how I'd do it:
from BTrees.IIBTree import IISet, difference
def missing_entries_for_index(catalog, index_name):
# Return the difference between catalog and index ids
index = catalog._catalog.getIndex(index_name)
referenced = IISet(index.referencedObjects()) # Works with any UnIndex-based index
return (
difference(IISet(catalog._catalog.paths), referenced),
len(catalog) - len(referenced)
)
The missing_entries_for_index
method returns an IISet of catalog ids and it's length; each is a pointer to a catalog record for which the named index has no entry. You can then use catalog.getpath
to turn that into a full path to objects, or use catalog.getMetadataForRID
to get a dictionary of metadata values, or use catalog.getobject
to get the original object itself, or use catalog._catalog[]
to get catalog brains.
The following method will give you a catalog result set, just like you would get from a regular catalog search:
from ZCatalog.Lazy import LazyMap
def not_indexed_results(catalog, index_name):
rs, length = missing_entries_for_index(catalog, index_name)
return LazyMap(catalog._catalog.__getitem__, rs.keys(), length)