Search code examples
rdplyrlevels

How to get the integer value of a factor with mutate()


I am trying to use dplyr::mutate() to return the integer value of the level of a factor column for each row. Here's what I have so far:

a <- tibble(group = factor(c(rep(c('group1', 'groupA', 'groupB'), 4), rep('groupC', 3))),
            name1 = factor(c(rep(c('gene1', 'gene2', 'geneA', 'geneB'),3), 
                             c('gene1', 'gene2', 'geneA'))),
            name2 = factor(c(rep(c('geneB', 'geneA', 'gene2', 'gene1'), 3),
                             c('geneB', 'geneA', 'gene2')))) %>%
  arrange(group)

a <- group_by(a, group) %>%
  mutate(n = row_number(),
         n_max = max(n),
         lev1 = which(levels(a$name1) == name1))

which causes the error message:

Error in mutate_impl(.data, dots) : 
  Column `lev1` must be length 4 (the group size) or one, not 2

but if I just run which(levels(a$name1) == 'gene2') I get the desired value 2.

What is causing this error and how do I get around it?


Solution

  • Are you after this?

    group_by(a, group) %>%
        mutate(
            n = row_number(),
            n_max = max(n),
            lev1 = as.numeric(name1))
    ## A tibble: 15 x 6
    ## Groups:   group [4]
    #   group  name1 name2     n n_max  lev1
    #   <fct>  <fct> <fct> <int> <dbl> <dbl>
    # 1 group1 gene1 geneB     1    4.    1.
    # 2 group1 geneB gene1     2    4.    4.
    # 3 group1 geneA gene2     3    4.    3.
    # 4 group1 gene2 geneA     4    4.    2.
    # 5 groupA gene2 geneA     1    4.    2.
    # 6 groupA gene1 geneB     2    4.    1.
    # 7 groupA geneB gene1     3    4.    4.
    # 8 groupA geneA gene2     4    4.    3.
    # 9 groupB geneA gene2     1    4.    3.
    #10 groupB gene2 geneA     2    4.    2.
    #11 groupB gene1 geneB     3    4.    1.
    #12 groupB geneB gene1     4    4.    4.
    #13 groupC gene1 geneB     1    3.    1.
    #14 groupC gene2 geneA     2    3.    2.
    #15 groupC geneA gene2     3    3.    3.
    

    name1 is already a factor, so as.numeric returns its factor level index.