Search code examples
rtidyverse

Convert existing dataframe variable to factor in Tidyverse


I know there are many versions to this question, but I am looking for a specific solution. When you have an existing character variable in a dataframe, is there an easy method for converting that variable to a factor using the tidyverse format? For example, the 2nd line of code below won't reorder the factor levels, but the last line will. How do I make the 2nd line work? There are situations when this would be useful -- importing and modifying existing datasets. Many Thanks!

df <- data.frame(x = c(1,2), y = c('post','pre')) %>%
      as_factor(y, levels = c('pre','post'))

df$y <- factor(df$y, levels = c('pre', 'post'))

Solution

  • We can use fct_relevel from forcats

    library(dplyr)
    library(forcats)
    df1 <- data.frame(x = c(1,2), y = c('post','pre')) %>% 
           mutate(y = fct_relevel(y, 'pre', 'post')) 
    

    -output

    > df1$y
    [1] post pre 
    Levels: pre post
    

    Regarding the use of as_factor, according to documentation

    Compared to base R, when x is a character, this function creates levels in the order in which they appear, which will be the same on every platform.

    i.e. post, followed by pre

    > as_factor(c('post','pre'))
    [1] post pre 
    Levels: post pre
    

    whereas the following options will not work as there is no argument named levels in as_factor

    > as_factor(c('post','pre'), "pre", "post")
    Error: 2 components of `...` were not used.
    
    We detected these problematic arguments:
    * `..1`
    * `..2`
    
    Did you misspecify an argument?
    Run `rlang::last_error()` to see where the error occurred.
    > as_factor(c('post','pre'), levels = c("pre", "post"))
    Error: 1 components of `...` were not used.
    
    We detected these problematic arguments:
    * `levels`
    
    Did you misspecify an argument?
    Run `rlang::last_error()` to see where the error occurred.
    

    Also, in tidyverse, we need to extract the column with pull or .$ or else have to modify the column within mutate.