Return list of which column ID and year has missing (NA) data only for all the other columns in R

ID	Year	Maths_sub	Eng_sub	Chem_sub	Comp_sub
1	2010	20	15	12	NA
2	2015	17	20	14	16
3	2012	NA	17	12	NA
4	2015	14	13	NA	NA
5	2016	NA	NA	NA	NA
6	2015	NA	NA	NA	NA

Hello, I want to know for the above table which ID and for which Year only people with all subject columns are NA. Results should be like below:

ID	Year
5	2016
6	2015

I have tried something along the lines of the code below but it is not giving me the ideal answer.

df %>%
  mutate(across(ends_with("sub"), is.na)) %>%  
  pivot_longer(-ID, names_to = "sub") %>%  
  filter(value) %>% 
  group_by(ID) %>%
  summarise(`Missing Variables` = toString(sub))

Also as an extra I would like to know if there is a way to mutate across only certain columns, for example only Maths and Comp and seeing which ID and Year has both these subjects NA?

Thank you very much!

Solution

base R approach

Instinctively I would use rowSums() applied to is.na() of the columns in question.

sub_cols <- grep("sub$", names(dat), value = TRUE)

dat[
    rowSums(is.na(dat[sub_cols])) == length(sub_cols),
    c("ID", "Year")
]
#   ID Year
# 5  5 2016
# 6  6 2015

`dplyr` approach

I actually find it quite awkward to do matrix operations using dplyr, but this is how I would do it:

library(dplyr)
dat %>%
    filter(
        rowSums(!is.na(select(., ends_with("sub")))) == 0
    ) %>%
    select(ID, Year)
#   ID Year
# 5  5 2016
# 6  6 2015

`data.table` approach

For completeness, here's a data.table approach. You can use patterns("sub$") to select columns ending in in "sub" and you can use rowSums() with .SD, so overall I think it's the nicest approach:

library(data.table)
setDT(dat)

dat[
    dat[, rowSums(!is.na(.SD)) == 0, .SDcols = patterns("sub$")],
    .(ID, Year)
]
#       ID  Year
#    <int> <int>
# 1:     5  2016
# 2:     6  2015

Data

dat <- structure(list(ID = 1:6, Year = c(2010L, 2015L, 2012L, 2015L, 
2016L, 2015L), Maths_sub = c(20L, 17L, NA, 14L, NA, NA), Eng_sub = c(15L, 
20L, 17L, 13L, NA, NA), Chem_sub = c(12L, 14L, 12L, NA, NA, NA
), Comp_sub = c(NA, 16L, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -6L))

Return list of which column ID and year has missing (NA) data only for all the other columns in R

base R approach

dplyr approach

data.table approach

Data

`dplyr` approach

`data.table` approach