algorithm recommendation-engine collaborative-filtering

Collaborative or structured recommendation?

I am building a Rails app that recommends tutors to students and vise versa. I need to match them based on multiple dimensions, such as their majors (Math, Biology etc.), experience (junior etc.), class (Math 201 etc.), preference (self-described keywords) and ratings.

I checked out some Rails collaborative recommendation engines (recommendable, recommendify) and Mahout. It seems that collaborative recommendation is not the best choice in my case, since I have much more structured data, which allows a more structured query. For example, I can have a recommendation logic for a student like:

if student looks for a Math tutor in Math 201:
  if there's a tutor in Math major offering tutoring in Math 201 then return
  else if there's a tutor in Math major then sort by experience then return
  else if there's a tutor in quantitative major then sort by experience then return
  ...

My questions are:

What are the benefits of a collaborative recommendation algorithm given that my recommendation system will be preference-based?
If it does provide significant benefits, how I can combine it with a preference-based recommendation as mentioned above?
Since my approach will involve querying multiple tables, it might not be efficient. What should I do about this?

Thanks a lot.

Solution

It sounds like your measurement of compatibility could be profitably reformulated as a metric. What you should do is try to interpret your `columns' as being different components of the dimension of your data. The idea is that you ultimately should produce a binary function which returns a measurement of compatibility between students and tutors (and also students/students and tutors/tutors). The motivation for extending this metric to all types of data is that you can then use this idea to reformulate your matching criteria as a nearest-neighbor search:

http://en.wikipedia.org/wiki/Nearest_neighbor_search

There are plenty of data structures and solutions to this problem as it has been very well studied. For example, you could try out the following library which is often used with point cloud data:

http://www.cs.umd.edu/~mount/ANN/

To optimize things a bit, you could also try prefiltering your data by running principal component analysis on your data set. This would let you reduce the dimension of the space in which you do nearest neighbor searches, and usually has the added benefit of reducing some amount of noise.

http://en.wikipedia.org/wiki/Principal_component_analysis

Good luck!