I have a long dataset where each participant (id) has 21 rows of data to reflect 21 different time points (time). Time 1, 2, and 3 represent data collected on "day" 1 and time 4, 5, 6 represent data collected from "day" 2 and so forth. I am trying to make it so that if "dep" or "anx" is NA during "time" 2, 5, 8, 11, 14, 17, 20, then all data on the "day" that is associated with that "time" is NA.
This is an example of how the data is structured in a more simplified way (2 participants, 6 rows of data each)
data <- structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), time = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6), day = c(1, 1, 1, 2, 2, 2, 1,
1, 1, 2, 2, 2), dep = c(18, NA, 14, 13, 15, 14, 8, 8, 9, 7, NA,
8), anx = c(28, NA, 22, 23, 25, 26, 7, 6, 7, 7, NA, 5)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))
And this is what I want it to look like:
In my actual dataset, I have over 3000 rows of data and so am trying to figure out if there's a good way to do this. Thank you so much for any insight you have.
library(dplyr)
data |>
mutate(
has_na_in_pos_2 = is.na(dep[2]) | is.na(anx[2]),
across(c(dep, anx), \(col) ifelse(has_na_in_pos_2, NA, col)),
.by = c(id, day)
) |>
select(-has_na_in_pos_2)
# # A tibble: 12 × 5
# id time day dep anx
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1 1 NA NA
# 2 1 2 1 NA NA
# 3 1 3 1 NA NA
# 4 1 4 2 13 23
# 5 1 5 2 15 25
# 6 1 6 2 14 26
# 7 2 1 1 8 7
# 8 2 2 1 8 6
# 9 2 3 1 9 7
# 10 2 4 2 NA NA
# 11 2 5 2 NA NA
# 12 2 6 2 NA NA