Search code examples
rdataframerecommendation-engine

Given a series of users and movies they watched in a data frame, how can I group all the movies the user watched?


So, I have a data frame, with userid and movieid, where each line represent a user and a movie he watched. Something like:

userid    movieid
882359    81
882359    926
882359    1349
881235    27

And what I want is

userid     movieid
882359     c(81,926,1349)
881235     c(27)

How can I accomplish this? The data base is quite large (8 million rows) and in the end I would like to convert it to a binaryRatingMatrix. Any help is appreciated.


Solution

  • You can use data.table:

    library(data.table)
    setDT(df)
    df[, .(films = paste(movieid, collapse = ",")), by = "userid"]
    
       userid       films
    1: 882359 81,926,1349
    2: 881235          27
    

    If you prefer storing into a list rather than a character vector:

    df[, .(films = list(movieid)), by = "userid"]
       userid          films
    1: 882359   81, 926,1349
    2: 881235             27
    

    (seemingly the same output but the types are not the same)