Here is the data for my example:
#### Create Data ####
df <- data.frame(X1 = c(NA,1,1,1,0),
X2 = c(1,1,1,0,0),
X3 = c(1,1,NA,0,0),
X4 = c(1,1,1,1,NA),
X5 = c(1,1,1,0,NA),
X6 = c(1,NA,1,1,NA)) %>%
as_tibble()
When you print the data, it looks like this:
# A tibble: 5 × 6
X1 X2 X3 X4 X5 X6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 NA 1 1 1 1 1
2 1 1 1 1 1 NA
3 1 1 NA 1 1 1
4 1 0 0 1 0 1
5 0 0 0 NA NA NA
Basically there are cases where there is sporadic and random missingness in this data (rows 1-4). However, those with three zeroes in a row are those that have been converted to NA values after a stopping rule for multiple "wrong" answers (row 5). Theoretically I could just blindly remove these with the following code:
df %>%
mutate(across(everything(),
~ replace(.,
is.na(.),
0)))
And the NA's would be removed:
# A tibble: 5 × 6
X1 X2 X3 X4 X5 X6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 1 1 1 1 1
2 1 1 1 1 1 0
3 1 1 0 1 1 1
4 1 0 0 1 0 1
5 0 0 0 0 0 0
However, it appears that this does not faithfully attack the problem. The NAs that are random are actually missing whereas the values that have been made NA are not. So I need a way to conditionally filter these values out for all cases where three 0s are recorded in a row, however I'm struggling with figuring out how to do this.
Using is.na
we could paste0
the rows to strings and check if number of matches with 111
are greater than zero using stringi::stri_count
to create a flag. After that, replace
NA
s with zeros if a flag is present.
num_NA <- 3
flag <- apply(+(is.na(df)), 1, paste0, collapse='') |>
stringi::stri_count(regex=paste(rep(1, num_NA), collapse='')) |> base::`>`(0)
df[flag, ] <- lapply(df[flag, ], \(x) replace(x, is.na(x), 0))
df
# X1 X2 X3 X4 X5 X6
# 1 NA 1 1 1 1 1
# 2 1 1 1 1 1 NA
# 3 1 1 NA 1 1 1
# 4 1 0 0 1 0 1
# 5 0 0 0 0 0 0
Data:
df <- structure(list(X1 = c(NA, 1, 1, 1, 0), X2 = c(1, 1, 1, 0, 0),
X3 = c(1, 1, NA, 0, 0), X4 = c(1, 1, 1, 1, NA), X5 = c(1,
1, 1, 0, NA), X6 = c(1, NA, 1, 1, NA)), class = "data.frame", row.names = c(NA,
-5L))