Search code examples
rsumconditional-statements

A function that sums cell values by conditions in R


How to create a function that would check the condition from x3, x4, ... and sum each values cells (in x1, x2, ...) from df2 to df1 according to the conditions it checks from columns in df2 (x3, x4, ...)? The combinations of conditions are unique, name and order of column df1=df2. Thank you very much.

Example

df1 
x1  x2  x3      x4    x5
1   5   1100    2016  string1
2   6   2100    2016  string2
3   7   1100    2017  string3
NA  8   2100    2017  string4
7   4   3100    2017  string5
                
df2 
x1  x2  x3      x4    x5
3   NA  1100    2017  string3 
4   8   2100    2017  string4
2   5   1100    2016  string1
1   7   2100    2016  string2
                
new_df1 
x1  x2  x3      x4    x5
3   10  1100    2016  string1
3   13  2100    2016  string2
6   7   1100    2017  string3
4   8   2100    2017  string4
7   4   3100    2017  string5

Solution

  • One option is combining the dataframes using rbind and then using dplyr to group_by your condition columns and summing the rest of the columns using summarise(across(everything(), sum, na.rm = T))

    df1 <- readr::read_table("
    x1  x2  x3  x4
    1   5   1100    2016
    2   6   2100    2016
    3   7   1100    2017
    NA  8   2100    2017
    7   4   3100    2017
    ")
    df2  <- readr::read_table("
    x1  x2  x3  x4
    3   NA  1100    2017
    4   8   2100    2017
    2   5   1100    2016
    1   7   2100    2016
    ")
    
    library(dplyr)
    
    rbind(df1, df2) %>%
      group_by(x3, x4) %>%
      summarise(across(everything(), sum, na.rm = T)) %>%
      ungroup %>%
      select(order(colnames(.)))
    
    #> # A tibble: 5 × 4
    #>      x1    x2    x3    x4
    #>   <dbl> <dbl> <dbl> <dbl>
    #> 1     3    10  1100  2016
    #> 2     6     7  1100  2017
    #> 3     3    13  2100  2016
    #> 4     4    16  2100  2017
    #> 5     7     4  3100  2017