machine-learning azure-machine-learning-service

Recommendations without ratings (Azure ML)

I'm trying to build an experiment to create recommendations (using the Movie Ratings sample database), but without using the ratings. I simply consider that if a user has rated certain movies, then he would be interested by other movies that have been rated by users that have also rated his movies.

I can consider, for instance, that ratings are 1 (exists in the database) or 0 (does not exist), but in that case, how do I transform the initial data to reflect this?

I couldn't find any kind of examples or tutorials about this kind of scenario, and I don't really know how to proceed. Should I transform the data before injecting it into an algorithm? And/or is there any kind of specific algorithm that I should use?

Solution

If you're hoping to use the Matchbox Recommender in AML, you're correct that you need to identify some user-movie pairs that are not present in the raw dataset, and add these in with a rating of zero. (I'll assume that you have already set all of the real user-movie pairs to have a rating of one, as you described above.)

I would recommend generating some random candidate pairs and confirming their absence from the training data in an Execute R (or Python) Script module. I don't know the names of your dataset's features, but here is some pseudocode in R to do that:

library(dplyr)
df <- maml.mapInputPort(1)  # input dataset of observed user-movie pairs
all_movies <- unique(df[['movie']])
all_users <- unique(df[['user']])
n <- 30  # number of random pairs to start with

negative_observations <- data.frame(movie = sample(all_movies, n, replace=TRUE),
                                    user = sample(all_users, n, replace=TRUE),
                                    rating = rep(0, n))          
acceptable_negative_observations <- anti_join(unique(negative_observations), df, by=c('movie', 'user'))
df <- rbind(df, acceptable_negative_observations)
maml.mapOutputPort("df");

Alternatively, you could try a method like association rule learning which would not require you to add in the fake zero ratings. Martin Machac has posted a nice example of how to do this in R/AML in the Cortana Intelligence Gallery.