I need to drop a column when it has more missingness than the other column. Here is a sample dataset.
df <- data.frame(id = c(1,2,3,4,5),
col.x = c(44,55,66,NA,NA),
col.y = c(44,NA,66,NA,NA))
> df
id col.x col.y
1 1 44 44
2 2 55 NA
3 3 66 66
4 4 NA NA
5 5 NA NA
In this case, col.y
has more missing values so I need to drop col.y
. How can I get the desired dataset below?
> df
id col.x
1 1 44
2 2 55
3 3 66
4 4 NA
5 5 NA
If the columns only differ with respect to their missing values, we can use coalesce()
:
library(tidyverse)
df <- tibble(
id = c(1, 2, 3, 4, 5),
col.x = c(44, 55, 66, NA, NA),
col.y = c(44, NA, 66, NA, NA)
)
df |>
mutate(col = coalesce(col.x, col.y), .keep = "unused")
#> # A tibble: 5 × 2
#> id col
#> <dbl> <dbl>
#> 1 1 44
#> 2 2 55
#> 3 3 66
#> 4 4 NA
#> 5 5 NA
Created on 2023-03-22 with reprex v2.0.2