Search code examples
hadoopelasticsearchmahoutmahout-recommender

elasticsearch integration with mahout


I would like to use Mahout to do some predictive analysis on data stored in elasticsearch to find similar documents or to recommend other records based on records that have been tagged with certain criteria.

I plan to create a Mahout cluster, however does elasticsearch have to sit within a Hadoop cluster to provide this functionality? Would I need to run es-hadoop? Or is there another way for Mahout to see the data in elasticsearch?

Would running es-hadooop have any impact on the speed compared to just elasticsearch?


Solution

  • Mahout does not need to sit on the same machines as Elasticsearch but can. The new Mahout has legacy implementations of row and item similarity based on Hadoop MapReduce but these will eventually be deprecated in favor of the newer Spark implementations, which have been in the code since Mahout 0.10.0, it's now on 0.11.0

    There is a full-blown recommender integration of Mahout's Spark code with Elasticsearch in PredictionIO's Universal Recommender. See docs for Mahout and PIO here: http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html https://github.com/PredictionIO/template-scala-parallel-universal-recommendation

    As to using Elasticsearch's es-hadooop, the Universal Recommender uses the Spark implementation of that and I'd say it is best to do so because it's optimized for distributed calculations. However there is no requirement to use it.