I have a data-set which contain fifty question (Q1 to Q50) . The value of this question is likert scale data from 1 to 5 . But in my data set i have some missing value so i want to replace missing value by its column mean value . Here is a sample code for single column
demodata$Q1 = ifelse(is.na(demodata$Q1),
ave(demodata$Q1, FUN = function(x)mean(x, na.rm = TRUE)),
demodata$Q1)
Now the problem is i have around 50 question in my data set . So its very tough to perform same operation each and every time . How can i manage this using a for loop or any easy technique ?
Consider sapply
, to reassign all columns:
demo_data[] <- sapply(demo_data, function(col) {
col[is.na(col)] <- mean(col, na.rm=TRUE)
return(col)
})
Test data (randomized and seeded)
# BUILD DATAFRAME OF 50 VARS AND 50 OBS
set.seed(5152018)
demo_data <- setNames(data.frame(replicate(50, replicate(50, sample(1:5, 1, replace=TRUE)))),
paste0("Q", 1:50))
# RANDOMLY ASSIGN NAs TO 5 ROWS PER COLUMN (SIMILARLY USED FOR ABOVE SOLUTION)
demo_data[] <- sapply(demo_data, function(col) {
col[sample(seq_along(demo_data), 5, replace=TRUE)] <- NA
return(col)
})