My dataset has columns and values like this. The column names all start with a common string, Col_a_**
ID Col_a_01 Col_a_02 Col_a_03
1 1 2 1
2 1 NA 0
3 NA 0 2
4 1 0 1
5 0 0 2
My goal is to replace the missing values with the mode values for that column.
The expected dataset to be like this
ID Col_a_01 Col_a_02 Col_a_03
1 1 2 1
2 1 0** 0
3 1** 0 2
4 1 0 1
5 0 0 2
The NA in the first column is replaced by 1 because the mode of the 1st column is 1. The NA in the second column is replaced by 0 because the mode for the 2nd column is 0.
I can do this like this below
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
df$Col_a_01[is.na(Col_a_01)==TRUE] <- getmode(df$Col_a_01)
df$Col_a_03[is.na(Col_a_02)==TRUE] <- getmode(df$Col_a_02)
df$Col_a_03[is.na(Col_a_03)==TRUE] <- getmode(df$Col_a_03)
But this becomes unwieldy if I have 100 columns starting with the similar names ending in 1,2,3..100. I am curious if there is an easier and more elegant way to accomplish this. Thanks in advance.
We can use na.aggregate
with FUN
specified as getmode
library(zoo)
library(dplyr)
df1 <- df1 %>%
mutate(across(starts_with('Col_a'), na.aggregate, FUN = getmode))
-output
df1
ID Col_a_01 Col_a_02 Col_a_03
1 1 1 2 1
2 2 1 0 0
3 3 1 0 2
4 4 1 0 1
5 5 0 0 2
Or it can be simply
na.aggregate(df1, FUN = getmode)
ID Col_a_01 Col_a_02 Col_a_03
1 1 1 2 1
2 2 1 0 0
3 3 1 0 2
4 4 1 0 1
5 5 0 0 2