Below is the code i used to do a mode imputation for the column status group of the dataset tan1.
How do I rewrite the same using pipes? the unique()
function does not seem to work in pipes.
NA_stat <- unique(tan1$status_group[!is.na(tan1$status_group)])
mode <- NA_stat[which.max(tabulate(match(tan1$status_group, NA_stat)))]
tan1$status_group[is.na(tan1$status_group)] <- mode
Also, how do I apply this same process for multiple columns?
Here are some examples of determining and imputing the mode in a pipe.
Functions to calculate mode:
library(tidyverse)
# Single mode (returns only the first mode if there's more than one)
# https://stackoverflow.com/a/8189441/496488
# Modified to remove NA
Mode <- function(x) {
ux <- na.omit(unique(x))
ux[which.max(tabulate(match(x, ux)))]
}
# Return all modes if there's more than one
# https://stackoverflow.com/a/8189441/496488
# Modified to remove NA
Modes <- function(x) {
ux <- na.omit(unique(x))
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}
Apply the functions to a data frame:
iris %>%
summarise(across(everything(), Mode))
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5 3 1.4 0.2 setosa
iris %>% map(Modes)
#> $Sepal.Length
#> [1] 5
#>
#> $Sepal.Width
#> [1] 3
#>
#> $Petal.Length
#> [1] 1.4 1.5
#>
#> $Petal.Width
#> [1] 0.2
#>
#> $Species
#> [1] setosa versicolor virginica
#> Levels: setosa versicolor virginica
Impute missing data using the mode. But note that we use Mode
, which returns only the first mode in cases where there are multiple modes. You may need to adjust your method if you have multiple modes.
# Create missing data
d = iris
d[1, ] = rep(NA, ncol(iris))
head(d)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 NA NA NA NA <NA>
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
# Replace missing values with the mode
d = d %>%
mutate(across(everything(), ~coalesce(., Mode(.))))
head(d)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.0 3.0 1.5 0.2 versicolor
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa