Search code examples
rdplyrsummarize

Summarise logical statement in if condition


I have a data frame with a column called 'col0' and several (more than 100) other columns (col1, col2, col3...). I'm trying to summarise them with dplyr conditionally (sum of col0 and weighted.mean of the others) but it doesn't work as it sums up all of the values in the column.

I assume there is a bias after the if statement.

Code:

dt <- data.frame(col0 = c(1,2,3), 
                 col1 = c(0.1,0.2,0.3), 
                 col2 = c(0.2,0.3,0.4), 
                 col3 = c(0.1,0.2,0.3), 
                 col4 = c(0.2,0.3,0.4))

dt %>%
  summarise(across(everything(), ~ if(any(names(.) !=  "col0"))
  weighted.mean(., col0, na.rm = TRUE) 
  else sum(., na.rm = TRUE)))

Result:

  col0 col1 col2 col3 col4
1    6  0.6  0.9  0.6  0.9

Note: The solution suggested by Ronak Shah is correct but (for whatever reason) I had to define wt explicitly in the weighted.mean function


Solution

  • You may use cur_column() to get the column name.

    library(dplyr)
    
    dt %>%
      summarise(across(everything(), ~ if(cur_column() != 'col0') 
                                        weighted.mean(., col0, na.rm = TRUE) 
                                        else sum(., na.rm = TRUE)))
    
    #  col0      col1      col2      col3      col4
    #1    6 0.2333333 0.3333333 0.2333333 0.3333333
    

    Another way would be to apply the function for col0 separately.

    dt %>%
      summarise(across(-col0, weighted.mean, col0, na.rm = TRUE),
                col0 = sum(col0))