rstatisticscollaborative-filteringcross-productrecommenderlab# Recommenderlab: Predict by UBCF binary rating matrix

In recommenderlab R package, on predicting `UBCF`

based on binary rating matrix, why does the script do crossprod between knn (`k`

nearest neighbors) similarities and new input binary ratings for the items? I'm writing a study and I wondering why is it a good way.

The results of the predict were very good on market basket recommendation, an I'm confused on `crossprod`

useful.

Solution

As described here, in `UBCF`

the missing ratings are predicted as aggregate ratings of the similar (neighboring) users.

Once the users in the neighborhood are found, their ratings are aggregated to form the predicted rating for the active user `u_a`

(as shown below).

- The easiest form is to just average the ratings in the neighborhood.
- Better version if to compute weighted average of the neighborhood ratings, where the weights are the similarity of a neighboring user with the active user.

Now, `crossprod()`

is used for computing the weighted average (can be used to compute simple average too, when weights are equal). Given matrices `x`

, `y`

, the matrix crossproduct is computed by `crossprod()`

as `t(x) %*% y`

or `t(y) %*% x`

(from documentation).

Take the following example from the documentation, as shown in the next figure:

Here, u_1, u_2 and u_4 are neighboring users for the active user u_a, for which ratings for 4 items are missing. Let's see how `crossprod()`

can be used to compute the missing ratings with simple and weighted averages of ratings of the neighboring users, respectively (using the code similar to the original implementation).

```
r_neighbors <- matrix(c(NA, 4.0, 4.0, 2.0, 1.0, 2.0, NA, NA,
3.0, NA, NA, NA, 5.0, 1.0, NA, NA,
4.0, NA, NA, 2.0, 1.0, 1.0, 2.0, 4.0), nrow=3, byrow=T)
u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
# simple average of neighbor ratings, with all weights equal to 1
s_uk <- matrix(rep(1, 3), ncol=1)
r_a <- as(crossprod(replace(r_neighbors, is.na(r_neighbors), 0), s_uk), "matrix") /
as(crossprod(!is.na(r_neighbors), s_uk), "matrix")
u_a[is.na(u_a)] <- r_a[is.na(u_a)]
u_a
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 3.5 4 4 3 2.333333 1 2 5
```

The above ratings match exactly with the ones computed in the figure. Also, you can reproduce the same prediction results for the new user `u_a`

with `recommenderlab`

's `predict()`

, as shown below:

```
library(recommenderlab)
u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
rec <- Recommender(as(r_neighbors, "realRatingMatrix"), method = "UBCF",
param=list(nn=3, normalize=NULL, weighted=FALSE))
pred <- as(predict(rec, newdata=as(u_a, "realRatingMatrix"), type="ratings"), "matrix")
u_a[is.na(u_a)] <- pred[is.na(u_a)]
u_a
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 3.5 4 4 3 2.333333 1 2 5
```

If you want to use-user similarity-based weights, the same code will do the job, with similarity weights this time,

```
u_a <- matrix(c(NA,NA,4.0,3.0,NA,1.0,NA,5.0), nrow=1)
s_uk <- matrix(c(0.3, 1.0, 0.3), ncol=1)
r_a <- as(crossprod(replace(r_neighbors, is.na(r_neighbors), 0), s_uk), "matrix") /
as(crossprod(!is.na(r_neighbors), s_uk), "matrix")
u_a[is.na(u_a)] <- r_a[is.na(u_a)]
u_a
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
# [1,] 3.230769 4 4 3 3.5 1 2 5
```

- Add percentage labels to geom_col()
- How to place a js inside a swiper with appendTo()?
- How to make a single plot from two dataframes with ggplot2
- Error in installing "TopicModels" package in Google Colab
- Identify connected subnetworks (R-igraph)
- Adding labels to geom_col()
- Legend title in ggplot2
- How can I extract a value from a dataframe based on values within that dataframe?
- R list files with multiple conditions
- R - getting count of maximum-sized sub-group when summarising at prior group_by level
- Problems when running GDC_prepare in R
- Filtering files with names starting with a specific string
- Mutate a vector within a pipe chain
- How to sum a variable by group
- Using hex code to change text color in RMarkdown PDF (R)
- How to Remove Degree and Cardinal Direction Symbols from ggplot Coordinate Axes
- rstan and brms cause R and RStudio session abort
- How to change the plot background color generated by plot(effect(...)) in grey with white grid in R？
- SQL query on arrow duckdb workflow in R
- Venn diagram with duplicated elements
- R- Filter by time closest to midnight
- Difference between rlm() and lm_robust
- Is there a way to combine sorting an rhandsontable and removing from an rhandsontable?
- Split violin plot with ggplot2
- ggbarplot top of one bar does not align with its error bar
- read file from google drive
- Placing text into stacked bar charts in ggplot
- R - windowing data backwards from latest data in non-overlapping (not-rolling) periods and counting within periods
- Replacing list elements while preserving their attributes
- Asymmetric partial matching of text strings between two dataframes