Search code examples
rdata-miningdata-processing

Replace mean of the column using for loop in R


I have a data-set which contain fifty question (Q1 to Q50) . The value of this question is likert scale data from 1 to 5 . But in my data set i have some missing value so i want to replace missing value by its column mean value . Here is a sample code for single column

demodata$Q1 = ifelse(is.na(demodata$Q1),
                 ave(demodata$Q1, FUN = function(x)mean(x, na.rm = TRUE)),
                 demodata$Q1)

Now the problem is i have around 50 question in my data set . So its very tough to perform same operation each and every time . How can i manage this using a for loop or any easy technique ?


Solution

  • Consider sapply, to reassign all columns:

    demo_data[] <- sapply(demo_data, function(col) {
      col[is.na(col)] <-  mean(col, na.rm=TRUE)
    
      return(col)
    })
    

    Test data (randomized and seeded)

    # BUILD DATAFRAME OF 50 VARS AND 50 OBS
    set.seed(5152018)
    demo_data <- setNames(data.frame(replicate(50, replicate(50, sample(1:5, 1, replace=TRUE)))),
                          paste0("Q", 1:50))
    
    # RANDOMLY ASSIGN NAs TO 5 ROWS PER COLUMN (SIMILARLY USED FOR ABOVE SOLUTION)
    demo_data[] <- sapply(demo_data, function(col) {
      col[sample(seq_along(demo_data), 5, replace=TRUE)] <- NA 
    
      return(col)
    })