Search code examples
rstatisticscollaborative-filteringcross-productrecommenderlab

Recommenderlab: Predict by UBCF binary rating matrix


In recommenderlab R package, on predicting UBCF based on binary rating matrix, why does the script do crossprod between knn (k nearest neighbors) similarities and new input binary ratings for the items? I'm writing a study and I wondering why is it a good way.

The results of the predict were very good on market basket recommendation, an I'm confused on crossprod useful.


Solution

  • As described here, in UBCF the missing ratings are predicted as aggregate ratings of the similar (neighboring) users.

    Once the users in the neighborhood are found, their ratings are aggregated to form the predicted rating for the active user u_a (as shown below).

    • The easiest form is to just average the ratings in the neighborhood.
    • Better version if to compute weighted average of the neighborhood ratings, where the weights are the similarity of a neighboring user with the active user.

    enter image description here

    Now, crossprod() is used for computing the weighted average (can be used to compute simple average too, when weights are equal). Given matrices x, y, the matrix crossproduct is computed by crossprod() as t(x) %*% y or t(y) %*% x (from documentation).

    Take the following example from the documentation, as shown in the next figure:

    enter image description here

    Here, u_1, u_2 and u_4 are neighboring users for the active user u_a, for which ratings for 4 items are missing. Let's see how crossprod() can be used to compute the missing ratings with simple and weighted averages of ratings of the neighboring users, respectively (using the code similar to the original implementation).

    r_neighbors <- matrix(c(NA, 4.0, 4.0, 2.0, 1.0, 2.0, NA, NA,
                           3.0, NA, NA, NA, 5.0, 1.0, NA, NA,
                           4.0, NA, NA, 2.0, 1.0, 1.0, 2.0, 4.0), nrow=3, byrow=T)
    
    u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
    
    # simple average of neighbor ratings, with all weights equal to 1
    s_uk <- matrix(rep(1, 3), ncol=1)
    r_a <- as(crossprod(replace(r_neighbors, is.na(r_neighbors), 0), s_uk), "matrix") /
              as(crossprod(!is.na(r_neighbors), s_uk), "matrix")
    u_a[is.na(u_a)] <- r_a[is.na(u_a)]
    u_a
    #      [,1] [,2] [,3] [,4]     [,5] [,6] [,7] [,8]
    # [1,]  3.5    4    4    3 2.333333    1    2    5
    

    The above ratings match exactly with the ones computed in the figure. Also, you can reproduce the same prediction results for the new user u_a with recommenderlab's predict(), as shown below:

    library(recommenderlab)
    u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
    rec <- Recommender(as(r_neighbors, "realRatingMatrix"), method = "UBCF", 
                       param=list(nn=3, normalize=NULL, weighted=FALSE))
    pred <- as(predict(rec, newdata=as(u_a, "realRatingMatrix"), type="ratings"), "matrix")
    u_a[is.na(u_a)] <- pred[is.na(u_a)]
    u_a
    #      [,1] [,2] [,3] [,4]     [,5] [,6] [,7] [,8]
    # [1,]  3.5    4    4    3 2.333333    1    2    5
    

    If you want to use-user similarity-based weights, the same code will do the job, with similarity weights this time,

    u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
    s_uk <- matrix(c(0.3, 1.0, 0.3), ncol=1)
    r_a <- as(crossprod(replace(r_neighbors, is.na(r_neighbors), 0), s_uk), "matrix") /
              as(crossprod(!is.na(r_neighbors), s_uk), "matrix")
    u_a[is.na(u_a)] <- r_a[is.na(u_a)]
    u_a
    #          [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
    # [1,] 3.230769    4    4    3  3.5    1    2    5