Search code examples
databaserecommendation-enginecollective-intelligence

How to create my own recommendation engine?


I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading "Programming Collective Intelligence" I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind.

  1. Where do (should be implemented on database side or backend side) I start creating recommendation engine?
  2. What level of database knowledge will be needed?
  3. Is there any open source ones that can be used for help or any resource?
  4. What should be the first steps that I have to do?

Solution

  • I've built up one for a video portal myself. The main idea that I had was about collecting data about everything:

    • Who uploaded a video?
    • Who commented on a video?
    • Which tags where created?
    • Who visited the video? (also tracking anonymous visitors)
    • Who favorited a video?
    • Who rated a video?
    • Which channels was the video assigned to?
    • Text streams of title, description, tags, channels and comments are collected by a fulltext indexer which puts weight on each of the data sources.

    Next I created functions which return lists of (id,weight) tuples for each of the above points. Some only consider a limited amount of videos (eg last 50), some modify the weight by eg rating, tag count (more often tagged = less expressive). There are functions that return the following lists:

    • Similar videos by fulltext search
    • Videos uploaded by the same user
    • Other videos the users from these comments also commented on
    • Other videos the users from these favorites also favorited
    • Other videos the raters from these ratings also rated on (weighted)
    • Other videos in the same channels
    • Other videos with the same tags (weighted by "expressiveness" of tags)
    • Other videos played by people who played this video (XY latest plays)
    • Similar videos by comments fulltext
    • Similar videos by title fulltext
    • Similar videos by description fulltext
    • Similar videos by tags fulltext

    All these will be combined into a single list by just summing up the weights by video ids, then sorted by weight. This works pretty well for around 1000 videos now. But you need to do background processing or extreme caching for this to be speedy.

    I'm hoping that I can reduce this to a generic recommendation engine or similarity calculator soon and release as a rails/activerecord plugin. Currently it's still a well integrated part of my project.

    To give a small hint, in ruby code it looks like this:

    def related_by_tags
      tag_names.find(:all, :include => :videos).inject([]) { |result,t|
        result + t.video_ids.map { |v|
          [v, TAG_WEIGHT / (0.1 + Math.log(t.video_ids.length) / Math.log(2))]
        }
      }
    end
    

    I would be interested on how other people solve such algorithms.