Search code examples
rfactorsforcats

Bug when collapsing a factor into groups, with forcats


I have the following data frame:

df = data.frame(a = 1:5) %>% as_tibble()

I want to collapse the values 1 and 3 into 'group1', 2 and 4 into 'group2' and the other values (e.g. 5) into 'Other'. I thought fct_collapse() would be the perfect function, but it does strange things...

df %>% 
  mutate(
    a = as.character(a),
    a_collapse = fct_collapse(a, 
             group1=c('1', '3'),
             group2 = c('2', '4'),
             group_other = TRUE))

Yet, the value 3 got 'group2' instead of 'group1'. Do you know why is this happening? I guess this has to do with the fact that the values of my factor are numerics but did not find a way to deal with that. Any idea?

Some posts deal with similar issues but did not help me in this case:

Replace factors with a numeric value

Joining factor levels of two columns


Solution

  • A simple case_when ?

    library(dplyr)
    df %>%
      mutate(a_collapse = factor(case_when(a %in% c(1, 3)~"group1", 
                                           a %in% c(2, 4) ~"group2", 
                                           TRUE ~ 'Other')))
    
    # A tibble: 5 x 2
    #     a a_collapse
    #  <int> <fct>     
    #1     1 group1    
    #2     2 group2    
    #3     3 group1    
    #4     4 group2    
    #5     5 Other     
    

    As far as fct_collapse is concerned the issue seems to be from including group_other as referenced in this issue on Github. If we remove that it works as expected but not giving any value to other groups.

    df %>% 
      mutate(
        a = as.character(a),
        a_collapse = forcats::fct_collapse(a, 
                                  group1=c('1', '3'),
                                  group2 = c('2', '4')))
    
    # A tibble: 5 x 2
    #   a     a_collapse
    #  <chr> <fct>     
    #1 1     group1    
    #2 2     group2    
    #3 3     group1    
    #4 4     group2    
    #5 5     5        
    

    This bug has been fixed in the development version of forcats and would be available in the next release.