Search code examples
rdplyrbinning

Unequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor


I am not sure what the problem is here.

dplyr::tibble(x = rnorm(100), group = rep(c('a','b'), 50)) %>% 
  dplyr::group_by(group) %>% 
  mutate(bin = OneR::bin(x, nbins = 10))

Unequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vector

But this works when labels = 1:10 is added.

dplyr::tibble(x = rnorm(100), group = rep(c('a','b'), 50)) %>% 
  dplyr::group_by(group) %>% 
  mutate(bin = OneR::bin(x, nbins = 10, labels = 1:10))

I would like to know the reason for the error in the first case.


Solution

  • Well, as Matt pointed out, it's rather a warning than an error. The warning relies on the fact that your bins depend on the data in each group. Since the bin names (aka labels) are set automatically, you get different factor levels for each group. Internally, dplyr binds all groups together you essentially join them. When joining, unequal factor labels should'nt be matched together (you just see the strings but it is stored as numeric underneath). Since you would rather keep the label of each value than the numeric, dplyr converts it into characters. See this example where I do the grouping by hand:

    set.seed(0)
    dplyr::tibble(x = rnorm(100), group = rep(c('a','b'), 50)) %>% 
     dplyr::group_by(group) %>% 
      mutate(bin = OneR::bin(x, nbins = 10))
    
    set.seed(0)
    data1 <- dplyr::tibble(x = rnorm(100), group = rep(c('a','b'), 50)) %>% 
      filter(group == "a") %>% 
      mutate(bin = OneR::bin(x, nbins = 10))
    
    set.seed(0)
    data2 <- dplyr::tibble(x = rnorm(100), group = rep(c('a','b'), 50)) %>% 
      filter(group == "b") %>% 
      mutate(bin = OneR::bin(x, nbins = 10))
    
    # same warning pops out
    bind_rows(data1, data2)