I would like to replace missing values in the columns var1:var6
with zero for only those rows with fewer than 2 missing values in the var1:var6
columns. I would then like to recalculate the sum column (I can happily use rowwise()
as per my reprex for that).
I've tried a few things using across()
, or rowwise()
and c_across()
but am struggling to find a solution.
library(tidyverse)
# Generate data
set.seed(40)
dat <- tibble(
id = 1:6,
var1 = sample(c(0:4, NA), 6, replace = TRUE),
var2 = sample(c(0:4, NA), 6, replace = TRUE),
var3 = sample(c(0:4, NA), 6, replace = TRUE),
var4 = sample(c(0:4, NA), 6, replace = TRUE),
var5 = sample(c(0:4, NA), 6, replace = TRUE),
var6 = sample(c(0:4, NA), 6, replace = TRUE),
)
dat %>%
rowwise() %>%
mutate(sum = sum(c_across(var1:var6))) %>%
ungroup()
This is the current tibble:
> dat
# A tibble: 6 × 8
id var1 var2 var3 var4 var5 var6 sum
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 3 4 4 NA NA 2 NA
2 2 NA NA 4 3 4 2 NA
3 3 4 4 1 1 4 1 15
4 4 1 2 4 4 4 NA NA
5 5 2 1 4 4 NA 2 NA
6 6 1 3 1 0 0 4 9
I would like the output to look like this:
> new_dat
# A tibble: 6 × 8
id var1 var2 var3 var4 var5 var6 sum
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 3 4 4 NA NA 2 NA
2 2 NA NA 4 3 4 2 NA
3 3 4 4 1 1 4 1 15
4 4 1 2 4 4 4 0 15
5 5 2 1 4 4 0 2 13
6 6 1 3 1 0 0 4 9
You can use across
like so:
dat %>%
mutate(across(var1:var6, ~ replace(.x, is.na(.x) & rowSums(is.na(across(var1:var6))) < 2, 0)),
sum = rowSums(across(var1:var6)))
# # A tibble: 6 × 8
# id var1 var2 var3 var4 var5 var6 sum
# <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 3 4 4 NA NA 2 NA
# 2 2 NA NA 4 3 4 2 NA
# 3 3 4 4 1 1 4 1 15
# 4 4 1 2 4 4 4 0 15
# 5 5 2 1 4 4 0 2 13
# 6 6 1 3 1 0 0 4 9