Search code examples
hadoopmahoutrecommendation-engine

using mahout: difference between running a .jar and typing console instructions


I am a little bit confused with mahout: I have the impression that there are two ways to use it:

  • executing a .jar, using the Taste recommender
  • using the command line, e.g. mahout recommenditembased --input input/recommend_data.csv --output output/recommendation --similarityClassname SIMILARITY_PEARSON_CORRELATION as shown here.

-> Is it correct or is it the same thing ?

My problem is: I have a csv input file with the following format: user_id, item_id, rating. I have 100 000 lines and I need to compute recommendations daily for all my users. I've read that it should be ok without hadoop, but it isn't: the .jar I have created works for small batches but not for the entire input file.

The command line method works in 5 min which is ok, but it's not as flexible as the jar project (above all for the interface with the MySQL database).

Is it possible to use a .jar and benefit from hadoop ? As I am not distributing any computation (hadoop runs on one server), is it normal to have such a difference between the .jar-without-mahout method, and the command-line-with-hadoop method ?

Many thanks for your help!


Solution

  • 100000 lines is not a lot of data. I believe you don't need to use the distributed version of the recommendation algorithm, even if it's running in pseudo-distributed mode (only one machine).

    You could easily make use of the API to build you own non-distributed recommender. Here's an example from the Mahout in Action book (which I recommend reading) [link]. In this case they are using a recommender based on similar users, and from what I see in your question you are using the one based on similar items.

    To make one using item similarity you would need to use a ItemSimilarity instead of the UserSimilarity. Likewise, instead of a GenericUserBasedRecommender you would use a GenericItemBasedRecommender. Then, of course, you would iterate through all the users and ask for the recommendations for each one of them.

    Hope this helps.