So, I have a data frame, with userid and movieid, where each line represent a user and a movie he watched. Something like:
userid movieid
882359 81
882359 926
882359 1349
881235 27
And what I want is
userid movieid
882359 c(81,926,1349)
881235 c(27)
How can I accomplish this? The data base is quite large (8 million rows) and in the end I would like to convert it to a binaryRatingMatrix. Any help is appreciated.
You can use data.table
:
library(data.table)
setDT(df)
df[, .(films = paste(movieid, collapse = ",")), by = "userid"]
userid films
1: 882359 81,926,1349
2: 881235 27
If you prefer storing into a list
rather than a character vector:
df[, .(films = list(movieid)), by = "userid"]
userid films
1: 882359 81, 926,1349
2: 881235 27
(seemingly the same output but the types are not the same)