Search code examples
rdataframedata-sciencerecommendation-engine

dataframe overridden within for-loop in r


I have dataset containing million observations from dataset i'm taking 10000 observations. Here is link to dataset file: dataset file link

itemRatingData = itemRatingData[1:10000,]
#V2 is user ID, V1 is item ID, V3 is item rating from use

library(plyr)
countUser = count(itemRatingData, vars = "V2")
#counted the total obeservation per user in dataset

list_of_total_Users = as.list(countUser$V2)
#taking out total number of users as a list

next thing i want to do is to extract those users observation who have rated 10 items minimum and i successfully did that. now i have such users who have rated 50, 100 and 1000+ items but i only need 10 observation from users who have minimum rated 10+ items. i did what comes to mind to get desired results:

for (i in 1:length(list_of_total_Users)) {
    occurencePerID = subset(itemRatingData, 
    itemRatingData$V2%in%list_of_total_Users[[i]])

    countOccurencePerID = count(occurencePerID, vars = "V2")
    if(countOccurencePerID$freq >= 10){
       newItemRatingData = occurencePerID[1:10,]
    }
}

in this code i'm subsetting total observations per user id and then counted them. if user id frequency >= 10 then extract first 10 observations. now the problem i'm facing is every time loop iterate it overwrite the newItemRatingData.


Solution

  • i have resolved my issue and solution is:

    newItemRatingData = data.frame("V2" = numeric(0), "V1" = numeric(0), "V3" = integer(0))
    
    for (i in 1:length(list_of_total_Users)) {
      occurencePerID = subset(itemRatingData, itemRatingData$V2%in%list_of_total_Users[[i]])
    
      countOccurencePerID = count(occurencePerID, vars = "V2")
      if(countOccurencePerID$freq >= 10){
         newItemRatingData = rbind(newItemRatingData,occurencePerID[1:10,])  
     }
    }
    

    as for @fino answer that answer was binding column wise dataframe. solution that i find binding dataframe row wise