Search code examples
rtidyrcollaborative-filtering

Creating a User-Item Matrix for Collaborative Filtering


I am attempting to run a Collaborative Filtering (CF) algorithm on a "User-Item-Rating" data. My data is in a long format i.e. each row has data for a User rating a specific item. I need to convert this into a "User-Item" matrix before I can apply a CF algorithm on it.

I am using the spread function from the tidyr package for this task. But given that I have more than 50k unique items, the resulting dataframe would be huge. R is unable to execute this (on my local machine) and throws up the "cannot allocate vector of size" error.

What's the best way to deal with this? Some of the options I tried exploring, but was unable to get them to work:

  • I was thinking if there is a way to return the output of spread call as a Sparse Matrix
  • I also tried exploring if packages which implements CF such as recommenderlab has an option to deal with this. But I could not see any option for that.

Any help will be greatly appreciated.

Thanks!


Solution

  • As you (probably) got sparse data, go with a sparse matrix. Here's an example for 50000 sparse example ratings:

    library(stringi)
    library(Matrix)
    set.seed(1)
    df <- data.frame(item = stri_rand_strings(50000, 4))
    df$user <- as.factor(1:nrow(df))
    df$rating <- sample(1:10, nrow(df), T)
    m <- sparseMatrix(
      i = as.integer(df$user), 
      j = as.integer(df$item), 
      x = df$rating, 
      dimnames = list(levels(df$user), levels(df$item))
    )