Search code examples
rdplyrnasummarizeacross

R Dplyr summarize rows except when only NAs


This is what my dataframe looks like

I have a dataframe of several columns and several rows per Participant_ID. I want to sum data for all lines of Participant_ID, to obtain one value per Participant_ID. The problem is that some columns are empty (all NAs), and for these columns I want to keep NA as a result. But when I sum with na.rm = T, it transforms the sum of NAs to 0.

I am using :

 df = df %>%
  group_by(Participant_ID) %>%
  summarise(across(where(is.numeric), ~ sum(.x, na.rm = T)))

How can I exclude columns (after group_by) with only NAs ? Or filter columns (after group_by) that contain at least one numeric value ?

Thanks a lot for your help !!


Solution

  • You just need the function inside across to handle the NA column differently.

    Suppose we have this data frame:

    df <- data.frame(Participant_ID = c('A', 'A', 'A'),
      a = 1:3, b = c(4, NA, 6), c = as.numeric(c(NA, NA, NA)))
    

    Then we can do

    library(dplyr)
    
    df %>%
      group_by(Participant_ID) %>%
      summarise(across(where(is.numeric), 
                       ~ if(all(is.na(.x))) NA_real_ else sum(.x, na.rm = T)))
    #> # A tibble: 1 x 4
    #>   Participant_ID     a     b     c
    #>   <chr>          <int> <dbl> <dbl>
    #> 1 A                  6    10    NA
    

    Created on 2022-11-03 with reprex v2.0.2