I'm trying to write a function to replace missing values in columns with median, and that this works for both factors/characters and numerical values.
library(dplyr)
test = data.frame(a=1:6,b=c("a","b",NA,NA,NA,"c"),c=c(1,1,1,1,2,NA),d=c("a","a","c",NA,NA,"b"))
fun_rep_na = function(df){
for(i in colnames(df)){
j<-sym(i)
df = df %>% mutate(!!j=if_else(is.na(!!j),median(!!j, na.rm=TRUE),!!j))
}
}
I see that tidyr has a function called replace_na, but I'm not sure how to use this either. Anyway, a custom function is what I would like.
The code above gives me an error.
We can use mutate_if
with median
as median
works only on numeric
columns
test %>%
mutate_if(is.numeric, list(~ replace(., is.na(.), median(., na.rm = TRUE))))
If we want the value most repeated, then we may need Mode
Mode <- function(x) {
x <- x[!is.na(x)]
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
The Mode
function was first updated here
test %>%
mutate_all(list(~ replace(., is.na(.), Mode(.))))
# a b c d
#1 1 a 1 a
#2 2 b 1 a
#3 3 a 1 c
#4 4 a 1 a
#5 5 a 2 a
#6 6 c 1 b