Search code examples
androidarchitecturelucenedb4o

How do I combine usage of db4o to store data and Lucene to index data for fast search?


I'm new to both db4o and Lucene.

Currently I'm using db4o to persist my data on an Android app. I need the capability to perform quick searches, as well as provide suggestions to the user (e.g., auto complete suggestions).

An SO poster mentioned using Lucene to index data and db4o to store it.

Has anyone implemented this approach ? If yes, I would appreciate if they share the overall approach? What are the alternatives?


Solution

  • I used Lucene to extract keywords from items to be stored in the database and store what I call 'keyword extension' objects that point to the corresponding domain objects. This made the domain objects findable by keyword (also allowing for stemming), and separated the keywords concerns. The database was built from a large static dataset (the USDA food nutrient database), so I didn't need to worry about changes during runtime. Thus this solution is limited in its current form ...

    The first part of the solution was to write a small chunk of code that takes some text and extracts both the keywords and corresponding stems (using Lucene's 'Snowball' stemming) into a map. You use this to extract the keywords/stems from some domain objects that you are storing in the database. I kept the original keywords around so that I could create some sort of statistics on the searches made.

    The second part was to construct objects I called 'keyword extensions' that store the stems as an array and the corresponding keywords as another array and have a pointer to the corresponding domain objects that had the keywords (I used arrays because they work more easily with DB4O). I also subclassed my KeywordExtension class to correspond to the particular domain objects's type - so for example I was storing a 'Nutrient' domain object and a corresponding 'NutrientKeywordExtension' object.

    The third part is to collect the user's entered search text, again use the stemmer to extract the stems, and search for the NutrientKeywordExtension objects with those stems. You can then grab the Nutrient objects that those extensions point to, and finally present them as search results.

    As I said, my database was static - it's created the first time the application runs. In a dynamic database, you would need to worry about keeping the nutrients and corresponding keyword extensions in sync. One solution would be to merge the nutrient and nutrient keyword extension into one class if you don't mind having that stuff inside your domain objects (I don't like this). Otherwise, you need to account for keyword extensions every time your create/edit/delete your domain objects.

    I hope this limited example helps.