Search code examples
rtidyversesummarize

grouped summarize still gives result for each individual row


I have the following data:

library(tidyverse)

df <- data.frame(id = c(1,1,1,2,2,2),
                 x = rep(letters[1:2], each = 3),
                 y = c(3,4,3,5,6,5),
                 z = c(7,8,9,10,11,12))

I now want to summarize the data by id in a way where I get the sum of z depending on y values. The y condition itself depends on the value of x.

I thought I could use the code below, but this gives me all input ids and doesn‘t summarize. The result is correct, but I still want to have one row per id.

df %>%
  group_by(id) %>%
  summarize(test = case_when(x == 'a' ~ sum(z[y == 3]),
                             x == 'b' ~ sum(z[y == 5])))

# A tibble: 6 x 2
# Groups:   id [2]
     id  test
  <dbl> <dbl>
1     1    16
2     1    16
3     1    16
4     2    22
5     2    22
6     2    22

The following works, but I don‘t understand why it does and the above code does not.

df %>%
  group_by(id) %>%
  summarize(test = case_when(all(x == 'a') ~ sum(z[y == 3]),
                             all(x == 'b') ~ sum(z[y == 5])))

# A tibble: 2 x 2
     id  test
  <dbl> <dbl>
1     1    16
2     2    22

Also, is there a more straigthforward way to do my summarization?


Solution

  • Because, case_when similar to ifelse(test, x, y) will return a vector of the same length as test. all(x == z) has length 1 and so the returned valued is of length 1.