Search code examples
jpahadoopejbmahoutrecommendation-engine

Apache Mahout, to use or not to use


I am implementing a simple recommendation system for a collection of user created components.

I was planning on doing this with JPA and a few dedicated EJB. My entities would have extra couple of lists containing the most up to date recommendations, then an EJB would crawl the data set and update this list periodically. The model is based on the relationships between components, and does not depend on past user behavior. I expect that the data set will remain relatively small. probably no more than half a million items.

I have a pretty good idea of how to do this with JPA and EJB, and I think for my particular use case, this would be very effective.

Should I spend the time to learn and implement Mahout? I do have a bit of experience with hadoop, although, I don't think my data set will be nearly large enough to justify bringing in the elephant.

Also, can anyone point me to a good primer on implementing recommendation systems with mahout?

Thanks a lot.


Solution

  • If you are implementing a recommender engine, be aware that that piece of Mahout has quite separate implementations based on Hadoop, and not based on Hadoop. That's good because Hadoop is not the sort of thing that would be hooked up directly to anything EJB-based. And you don't have huge scale problems. So, you don't need to worry about Hadoop.

    You want to look at the stuff in org.apache.mahout.cf.taste.impl besides the .hadoop package; it's all just pure Java so you could embed it in an EJB. I think you want to look at the Recommender API and then just wrap that in your session bean and expose it however you like.

    (Do you really want to use EJBs these days? Separate question...)

    In fact, the previous release, 0.4, still had an EJB binding example as a stateless session bean. You could dig out and reuse that wrapper.

    The best web resource for this part of the code is: https://cwiki.apache.org/MAHOUT/recommender-documentation.html

    Our book, Mahout in Action, is obviously not free but is certainly the best and only reference for the project. I wrote the code in question here and the part of the book on this code and so it's pretty direct from the source.