r dataframe grouping subset multiple-columns

R subset dataframe where no observations of certain variables

I have a dataframe that looks like

country	sector	data1	data2
France	1	7	.
France	2	10	.
belgium	1	12	7
belgium	2	14	8

I want to subset columns that are missing for a country in all sectors. In this example I would like to drop/exclude column two because it is missing for sector 1 and 2 for france. To be clear I would also be throwing out the values of data2 for belgium in this example.

My expected output would look like

country	sector	data1
France	1	7
France	2	10
belgium	1	12
belgium	2	14

data 2 is now excluded because it had a complete set of missing values for all sectors in France

Solution

We may group by country, create logical columns where the count of NA elements are equal to group size, ungroup, replace the corresponding columns to NA based on the logical column and remove those columns in select

library(dplyr)
library(stringr)
df1 %>% 
  group_by(country) %>% 
  mutate(across(everything(), ~ sum(is.na(.x)) == n(),
       .names = "{.col}_lgl")) %>% 
  ungroup %>% 
  mutate(across(names(df1)[-1], ~ if(any(get(str_c(cur_column(), 
     "_lgl")) )) NA else .x)) %>% 
   select(c(where(~ !is.logical(.x) && any(complete.cases(.x)))))

-output

# A tibble: 4 × 3
  country sector data1
  <chr>    <int> <int>
1 France       1     7
2 France       2    10
3 belgium      1    12
4 belgium      2    14

If we don't use group_by, the steps can be simplified as showed in Maël's post i.e. do the grouping with a base R function within select i.e. either tapply or ave can work

df1 %>%
   select(where(~ !any(tapply(is.na(.x), df1[["country"]], 
     FUN = all))))

data

df1 <- structure(list(country = c("France", "France", "belgium", "belgium"
), sector = c(1L, 2L, 1L, 2L), data1 = c(7L, 10L, NA, 14L), data2 = c(NA, 
NA, 7L, 8L)), row.names = c(NA, -4L), class = "data.frame")