Search code examples
zend-frameworkzend-search-lucenezend-lucene

Zend: index generation and the pros and cons of Zend_Search_Lucene


I've never came across an app/class like Zend Search Lucene before, as I've always queried my database.

Zend_Search_Lucene operates with documents as atomic objects for indexing. A document is divided into named fields, and fields have content that can be searched.

A document is represented by the Zend_Search_Lucene_Document class, and this objects of this class contain instances of Zend_Search_Lucene_Field that represent the fields on the document.

It is important to note that any information can be added to the index. Application-specific information or metadata can be stored in the document fields, and later retrieved with the document during search.

So this is basically saying that I can apply this to anything including databases, the key thing here is making indexes for searching.

What I'm trying to grasp is where exactly should I store the indexes in my application, let's take for example we have phones stored in a database, a manufacturers, models - how should I categorize the indexes?

If I'm making indexes of users with say, addresses I obviously wouldn't want them to be publically viewable, I'm just confused on how it all works out together, if there are known disadvantages, any gotchas I should know while using it.


Solution

  • A Lucene index is stored outside the database. I'd store it in a "data" directory as a sister to your controllers, models, and views. But you can store it anywhere; you just need to specify the path when you open the index for querying.

    It's basically a redundant copy of the documents stored in your database, and you have to keep them in sync yourself. That's one of the disadvantages: you have to write code to populate the Lucene index based on results of a query against your database. As you add data to the database, you have to update your Lucene index as well.

    An advantage of using an external full-text index solution is that you can reduce the workload on your RDBMS. To find a document, you execute a search using the Lucene API. The result should include a field containing the primary key value (as part of the document but no need to make it analyzed for FT search). You get this field back when you do a Lucene search, so you can look up the respective row in the database.

    Does that help answer your question?

    I gave a presentation recently for MySQL University comparing full-text search solutions: http://forge.mysql.com/wiki/Practical_Full-Text_Search_in_MySQL

    I also publish my slides at http://www.SlideShare.net/billkarwin.