Search code examples
phpalgorithmzend-search-lucenepopularityscoring

What is the best way to properly integrate a "popularity" factor with zend-search lucene?


I've read this and I'm still a bit confused on how to exactly go about it.

I have an unindexed field that is counting the number of votes for a set of playlists that are being searched. The main search works fine, but I also want to include the voting field as part of the algorithm and I'm not sure how to include the non-indexed field as part of it. Can anyone offer any guidance or an example?


Solution

  • You do not have to necessarily adapt the scoring algorithm (which implements tf-idf btw).

    If you just want to integrate the number of views into the scoring calculation, you can "boost" the search document before adding it to the index, e.g.:

    $doc = new Zend_Search_Lucene_Document();
    $boostFactor = 0.1;
    $doc->boost = (float)$numberOfVotes * $boostFactor;
    // ..
    $index->addDocument($doc);
    $index->commit();
    

    The boost factor in this example is not really relevant, since you only have one boosting criteria. If you want to boost non-linear, you could also use exp or sqrt on $numberOfVotes.

    But another question:

    Why not use ElasticSearch (or another performant search engine) in the first place?

    ElasticSearch e.g. is way more powerful and faster than the PHP implemenation of Zend Lucene. Plus it is really easy to hook into the scoring mechanism, e.g. http://www.elasticsearch.org/guide/reference/query-dsl/custom-score-query.html You can use a PHP client like Elastica along with it.