Search code examples
ruby-on-railspostgresqljrubymahoutjrubyonrails

Apache Mahout with Ruby on Rails architecture


I'm trying to build a recommendation engine using rails with apache mahout, but I'm having trouble figuring out my starting point.

I have a simple rails 4.2.1 app with a postgres database which uses Active Record, hosted on heroku.

Reading up on Mahout, it seems that I can use the JDBCDataModel interface to get data for my recommendation engine, which means that I probably need to

  • change my Ruby-on-rails project to a JRuby-on-rails project,
  • use the [activerecord-jdbc-adapter][1] to communicate with the Mahout library, which I would have to include in my Rails project.

Assuming that I get all these pieces working, I will then

  • write my recommendor using Mahout's API in a JRuby script,
  • run this script as a background job using Resque which will keep calculating recommendations based on user actions.

Does this architecture seem sound? Or should I just move from rails to a java serverlet?

I'm extremely comfortable in rails, and have only used Java to build simple Android apps, with Rails/Node as backend.


Solution

  • Have you looked at PredicitonIO? They have several recommenders built on Spark and include a Ruby SDK. Check their template gallery and check their Ruby SDK.

    Mahout has a new recommender component that is meant to work with a search engine. If you want something extremely flexible you can check it out on the Mahout site. It will depend on how you integrate a search engine since the Mahout model needs to be indexed with one and recs are returned with a search query.

    I believe the Mahout version is also a work in progress at PredictionIO.

    BTW these are all built on Spark and don't use the older Hadoop MapReduce versions from Mahout. They all allow you to use your own application specific user and items ids, where the older Mahout recommenders require you to maintain a mapping into and out of Mahout IDs. These also allow realtime served recs from realtime gathered usage data.