Search code examples
database-designdb4oobject-oriented-database

What object databases allow indexing of everything in the database?


Currently, db4o does not allow indexing on the contents of collections. What object databases do allow indexing of any individual field in the database?

Example:

class RootClass
{
   string thisIsIndexed; // Field can be indexed for quick searching.
   IList<SubClass> contentsNotIndexed = new ArrayList(); // Creates a 1-to-many relationship.
}

class SubClass
{
   string thisIsNotIndexed; // Field cannot be indexed.
}

For db4o to search by field "thisIsNotIndexed", it would have to load the complete object into memory, then use LINQ-to-Objects to scan through the field. This is slow, as it means you would potentially have to load the entire database into RAM to do a search. The way to work around this is to have all of the fields you want to search by in the root object, however, this seems like an artificial limitation.

Are there any object databases that do not suffer from this limitation, and allow indexing of any string in a sub-object?

Update

Answer #1:

I found a method which gives the best of both worlds: ease of use (with a hierarchical structure), and blindingly fast native queries using full indexing on the whole tree. It involves a bit of a trick, and a method that caches the contents of parent nodes:

  1. Create the nested hierarchy as normal.
  2. For each sub-node, create a reverse reference to the nodes parent.
  3. You can now query the leaf nodes. We are half way there now - we can query, however, its slow as it has to do a join to navigate up the tree nodes if you want to search by some parameter in a parent node.
  4. To speed it up, create a "cache" parameter which caches the search terms in the parent node. Its a method that is initially set to null, the first time its called it does an expensive join, then it mirrors the field, and from that point on the search is extremely quick.
  5. This works well for data that never changes, i.e. temperature samples over time. If the data is going to change, then you need some way of clearing the cached values if the value in the root node changes, perhaps by setting a "dirty" flag in each leaf node.

Answer #2:

If you use an Array instead of a List, you can descend into the child node using SODA. If you use a List, SODA doesn't support it, so you simply can't query with SODA (or anything else that depends on SODA, such as LINQ, QBE, Native queries, etc).


Solution

  • I'm basing this on my experience with DB40 under Scala & Java, but hopefully this is still valid: The field 'contentsNotIndexed' holds ArrayList instances, so indexing that field should only assit you in querying those ArrayList instances. If you want to query the contents of those lists efficiently, you would have to define an index on the objects you expect to find inside the lists and descend you query into the ArrayList under the 'contentsNotIndexed' field. I don't know the internals of ArrayList to suggest where that might descend though.

    Depending on your needs, you can also design your class to use an array instead of an ArrayList in some cases to achieve the effect you want.