I have dataset containing million observations from dataset i'm taking 10000 observations. Here is link to dataset file: dataset file link
itemRatingData = itemRatingData[1:10000,]
#V2 is user ID, V1 is item ID, V3 is item rating from use
library(plyr)
countUser = count(itemRatingData, vars = "V2")
#counted the total obeservation per user in dataset
list_of_total_Users = as.list(countUser$V2)
#taking out total number of users as a list
next thing i want to do is to extract those users observation who have rated 10 items minimum and i successfully did that. now i have such users who have rated 50, 100 and 1000+ items but i only need 10 observation from users who have minimum rated 10+ items. i did what comes to mind to get desired results:
for (i in 1:length(list_of_total_Users)) {
occurencePerID = subset(itemRatingData,
itemRatingData$V2%in%list_of_total_Users[[i]])
countOccurencePerID = count(occurencePerID, vars = "V2")
if(countOccurencePerID$freq >= 10){
newItemRatingData = occurencePerID[1:10,]
}
}
in this code i'm subsetting total observations per user id and then counted them. if user id frequency >= 10 then extract first 10 observations. now the problem i'm facing is every time loop iterate it overwrite the newItemRatingData.
i have resolved my issue and solution is:
newItemRatingData = data.frame("V2" = numeric(0), "V1" = numeric(0), "V3" = integer(0))
for (i in 1:length(list_of_total_Users)) {
occurencePerID = subset(itemRatingData, itemRatingData$V2%in%list_of_total_Users[[i]])
countOccurencePerID = count(occurencePerID, vars = "V2")
if(countOccurencePerID$freq >= 10){
newItemRatingData = rbind(newItemRatingData,occurencePerID[1:10,])
}
}
as for @fino answer that answer was binding column wise dataframe. solution that i find binding dataframe row wise