Search code examples
rtidyverse

error when using case_when in grouped data frame (because it evaluates all conditions)


I have a data frame like:

df <- data.frame(group   = c(1, 1, 1, 2, 2, 2),
                 var     = c(1, 2, 3, 2, 3, 4),
                 value   = c(1, 2, 3, 4, 5, 6),
                 ranking = c(1, 2, 3, 1, 2, 3))

  group var value ranking
1     1   1     1       1
2     1   2     2       2
3     1   3     3       3
4     2   2     4       1
5     2   3     5       2
6     2   4     6       3

What I want to do:

Group by var and then take the rank of group 1 (if a row with group == 1 exists in this group), otherwise either take the existing rank or (in case group is 2) add a certain number to the rank. So it's kind of "joining" the ranks by adding those ranks to the end that only exist in one group (particularly in group 2).

Here's my code:

df |> 
  group_by(var) |> 
  mutate(ranking = case_when(n() == 2   ~ ranking[group == 1],
                             group == 1 ~ ranking,
                             group == 2 ~ ranking + 3))

which gives an error:

Error in `mutate()`:
ℹ In argument: `ranking = case_when(...)`.
ℹ In group 4: `var = 4`.
Caused by error:
! `ranking` must be size 1, not 0.

The problem is that case when is evaluating the ranking[group == 1] for each row irrespective of the fact that some groupings won't hava a group == 1. I've come across this problem before but can't remember anymore/find how we solved it back then.

Expected output would be:

  group var value ranking
1     1   1     1       1
2     1   2     2       2
3     1   3     3       3
4     2   2     4       2
5     2   3     5       3
6     2   4     6       6

Solution

  • Using match instead of == should fix it because :

    1. match exactly return 1 match (first one), == can return more than one if they exist.
    2. it returns NA instead of numeric(0) when there is no match.
    library(dplyr)
    
    df |> 
      group_by(var) |> 
      mutate(ranking = case_when(n() == 2 ~ ranking[match(1, group)],
                                 group == 1 ~ ranking,
                                 group == 2 ~ranking + 3))
    
    #   group   var value ranking
    #  <dbl> <dbl> <dbl>   <dbl>
    #1     1     1     1       1
    #2     1     2     2       2
    #3     1     3     3       3
    #4     2     2     4       2
    #5     2     3     5       3
    #6     2     4     6       6