r dplyr filter missing-data data-wrangling

How to filter out data with conditional statement for series of numbers in R?

Data

Here is the data for my example:

#### Create Data ####
df <- data.frame(X1 = c(NA,1,1,1,0), 
                 X2 = c(1,1,1,0,0),
                 X3 = c(1,1,NA,0,0),
                 X4 = c(1,1,1,1,NA),
                 X5 = c(1,1,1,0,NA),
                 X6 = c(1,NA,1,1,NA)) %>% 
  as_tibble()

Problem

When you print the data, it looks like this:

# A tibble: 5 × 6
     X1    X2    X3    X4    X5    X6
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    NA     1     1     1     1     1
2     1     1     1     1     1    NA
3     1     1    NA     1     1     1
4     1     0     0     1     0     1
5     0     0     0    NA    NA    NA

Basically there are cases where there is sporadic and random missingness in this data (rows 1-4). However, those with three zeroes in a row are those that have been converted to NA values after a stopping rule for multiple "wrong" answers (row 5). Theoretically I could just blindly remove these with the following code:

df %>% 
  mutate(across(everything(),
                ~ replace(.,
                          is.na(.),
                          0)))

And the NA's would be removed:

# A tibble: 5 × 6
     X1    X2    X3    X4    X5    X6
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     0     1     1     1     1     1
2     1     1     1     1     1     0
3     1     1     0     1     1     1
4     1     0     0     1     0     1
5     0     0     0     0     0     0

However, it appears that this does not faithfully attack the problem. The NAs that are random are actually missing whereas the values that have been made NA are not. So I need a way to conditionally filter these values out for all cases where three 0s are recorded in a row, however I'm struggling with figuring out how to do this.

Solution

Using is.na we could paste0 the rows to strings and check if number of matches with 111 are greater than zero using stringi::stri_count to create a flag. After that, replace NAs with zeros if a flag is present.

num_NA <- 3
flag <- apply(+(is.na(df)), 1, paste0, collapse='') |>
  stringi::stri_count(regex=paste(rep(1, num_NA), collapse='')) |> base::`>`(0)

df[flag, ] <- lapply(df[flag, ], \(x) replace(x, is.na(x), 0))
df
#   X1 X2 X3 X4 X5 X6
# 1 NA  1  1  1  1  1
# 2  1  1  1  1  1 NA
# 3  1  1 NA  1  1  1
# 4  1  0  0  1  0  1
# 5  0  0  0  0  0  0

Data:

df <- structure(list(X1 = c(NA, 1, 1, 1, 0), X2 = c(1, 1, 1, 0, 0), 
    X3 = c(1, 1, NA, 0, 0), X4 = c(1, 1, 1, 1, NA), X5 = c(1, 
    1, 1, 0, NA), X6 = c(1, NA, 1, 1, NA)), class = "data.frame", row.names = c(NA, 
-5L))