Search code examples
rdataframeaggregatetibble

Combine sets of 2 rows based on different values of variables


I have the following table (a tibble)

Year Name n
1969 Alice 24
1969 Bob 25
1969 Charlie 33
1970 Alice 13
1970 Bob 20
1970 Charlie 10
1971 Alice 23
1971 Bob 10
1971 Charlie 16

etc.

I want to combine "Bob" and "Charlie" into a single row for each year by summing their value for n and give them a new name "Derek" (it makes more sense in my actual data). So I want

Year Name n
1969 Alice 24
1969 Derek 58
1970 Alice 13
1970 Derek 30
1971 Alice 23
1971 Derek 26

I have tried figuring out how to do it with aggregate() and case_when(), but I can't seem to get it to work.

Any help welcome


Solution

  • We can first replace the names with Derek. Then summarise to sum the ns from the doubles, grouped .by Name and Year.

    library(dplyr)
    
    df |>
        mutate(Name = case_match(Name, c("Bob", "Charlie") ~ "Derek",
                                .default = Name)) |>
        summarise(n = sum(n), .by = c(Year, Name))
    
      Year  Name  n
    1 1969 Alice 24
    2 1969 Derek 58
    3 1970 Alice 13
    4 1970 Derek 30
    5 1971 Alice 23
    6 1971 Derek 26