Search code examples
recommendation-engine

Writing a basic recommendation engine


I'm looking to write a basic recommendation engine that will take and store a list of numeric IDs (which relate to books), compare those to other users with a high volume of identical IDs and recommend additional books based on those finds.

After a bit of Googling, I've found this article, which discusses an implementation of a Slope One algorithm, but seems to rely on users rating the items being compared. Ideally, I'd like to achieve this without the need for users to provide ratings. I'm assuming that if the user has this book in their collection, they are fond of it.

While it strikes me that I could default a rating of 10 for each book, I'm wondering if there's a more efficient algorithm I could be using. Ideally I'd like to calculate these recommendations on the fly (avoiding batch calculation). Any suggestions would be appreciated.


Solution

  • A basic algorithm for your task is a collaborative memory-based recommender system. It's quite easy to implement, especially when your items (in your case books) just have IDs and no other features.

    But, as you already said, you need some kind of rating from the users for the items. But don't think of a rating like in 1 to 5 stars, but more like a binary choice like 0 (book not read) and 1 (book read), or interested in or not interested in.

    Then use an appropriate distance measure to calculate the difference between all users (and their sets of items) and yourself, select the n most similar users to yourself (of whoever the active user is) and pick out their items you haven't rated (or considered, choice 0).

    I think in this case, a good distance measure would be the 1-norm distance, or sometimes called the Manhattan distance. But this is a point where you have to experiment with your dataset to get the best results.

    A nice introduction to this topic is the paper by Breese et al., Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Available here (PDF). For an research paper, it's an easy read.