Search code examples
rdplyrmissing-data

Replace missing values if fewer than x columns missing


I would like to replace missing values in the columns var1:var6 with zero for only those rows with fewer than 2 missing values in the var1:var6 columns. I would then like to recalculate the sum column (I can happily use rowwise() as per my reprex for that).

I've tried a few things using across(), or rowwise() and c_across() but am struggling to find a solution.

library(tidyverse)

# Generate data
set.seed(40)
dat <- tibble(
  id = 1:6,
  var1 = sample(c(0:4, NA), 6, replace = TRUE),
  var2 = sample(c(0:4, NA), 6, replace = TRUE),
  var3 = sample(c(0:4, NA), 6, replace = TRUE),
  var4 = sample(c(0:4, NA), 6, replace = TRUE),
  var5 = sample(c(0:4, NA), 6, replace = TRUE),
  var6 = sample(c(0:4, NA), 6, replace = TRUE),
)

dat %>%
  rowwise() %>%
  mutate(sum = sum(c_across(var1:var6))) %>%
  ungroup()

This is the current tibble:

> dat
# A tibble: 6 × 8
     id  var1  var2  var3  var4  var5  var6   sum
  <int> <int> <int> <int> <int> <int> <int> <int>
1     1     3     4     4    NA    NA     2    NA
2     2    NA    NA     4     3     4     2    NA
3     3     4     4     1     1     4     1    15
4     4     1     2     4     4     4    NA    NA
5     5     2     1     4     4    NA     2    NA
6     6     1     3     1     0     0     4     9

I would like the output to look like this:

> new_dat
# A tibble: 6 × 8
     id  var1  var2  var3  var4  var5  var6   sum
  <int> <int> <int> <int> <int> <int> <int> <int>
1     1     3     4     4    NA    NA     2    NA
2     2    NA    NA     4     3     4     2    NA
3     3     4     4     1     1     4     1    15
4     4     1     2     4     4     4     0    15
5     5     2     1     4     4     0     2    13
6     6     1     3     1     0     0     4     9

Solution

  • You can use across like so:

    dat %>% 
      mutate(across(var1:var6, ~ replace(.x, is.na(.x) & rowSums(is.na(across(var1:var6))) < 2, 0)),
             sum = rowSums(across(var1:var6)))
    
    # # A tibble: 6 × 8
    #      id  var1  var2  var3  var4  var5  var6   sum
    #   <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
    # 1     1     3     4     4    NA    NA     2    NA
    # 2     2    NA    NA     4     3     4     2    NA
    # 3     3     4     4     1     1     4     1    15
    # 4     4     1     2     4     4     4     0    15
    # 5     5     2     1     4     4     0     2    13
    # 6     6     1     3     1     0     0     4     9