Search code examples
rconditional-statementsmultiple-columns

Removing data from certain columns based on several conditions in long dataset in R


I have a long dataset where each participant (id) has 21 rows of data to reflect 21 different time points (time). Time 1, 2, and 3 represent data collected on "day" 1 and time 4, 5, 6 represent data collected from "day" 2 and so forth. I am trying to make it so that if "dep" or "anx" is NA during "time" 2, 5, 8, 11, 14, 17, 20, then all data on the "day" that is associated with that "time" is NA.

This is an example of how the data is structured in a more simplified way (2 participants, 6 rows of data each)

data <- structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), time = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6), day = c(1, 1, 1, 2, 2, 2, 1, 
1, 1, 2, 2, 2), dep = c(18, NA, 14, 13, 15, 14, 8, 8, 9, 7, NA, 
8), anx = c(28, NA, 22, 23, 25, 26, 7, 6, 7, 7, NA, 5)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))

enter image description here

And this is what I want it to look like:

enter image description here

In my actual dataset, I have over 3000 rows of data and so am trying to figure out if there's a good way to do this. Thank you so much for any insight you have.


Solution

  • library(dplyr)
    data |>
      mutate(
        has_na_in_pos_2 = is.na(dep[2]) | is.na(anx[2]),
        across(c(dep, anx), \(col) ifelse(has_na_in_pos_2, NA, col)),
        .by = c(id, day)
      ) |>
      select(-has_na_in_pos_2)
    # # A tibble: 12 × 5
    #       id  time   day   dep   anx
    #    <dbl> <dbl> <dbl> <dbl> <dbl>
    #  1     1     1     1    NA    NA
    #  2     1     2     1    NA    NA
    #  3     1     3     1    NA    NA
    #  4     1     4     2    13    23
    #  5     1     5     2    15    25
    #  6     1     6     2    14    26
    #  7     2     1     1     8     7
    #  8     2     2     1     8     6
    #  9     2     3     1     9     7
    # 10     2     4     2    NA    NA
    # 11     2     5     2    NA    NA
    # 12     2     6     2    NA    NA