Search code examples
rdplyrforcats

How to use the factor(f) syntax in dplyr/ forcats package in R?


I am trying to do something very simple, which is use the forcats package in R to work with factors. I have a dataframe with some factor variables, one of which is gender, and I'm simply trying to count the occurrence of the variables using fct_count. The syntax is shown in the documentation as fct_count(f) (what could be easier!).

I'm trying to do this the dplyr way, using the pipe operator instead of the $ syntax to access the variables, but it doesn't seem to work. Am I just fundamentally misunderstanding the syntax?

pid <- c('id1','id2','id3','id4','id5','id6')
gender <- c('Male','Female','Other','Male','Female','Female')
df <- data.frame(pid, gender)
df <- as.tibble(df)
df
# A tibble: 6 x 2
  pid   gender
  <chr> <fct> 
1 id1   Male  
2 id2   Female
3 id3   Other 
4 id4   Male  
5 id5   Female
6 id6   Female
# This throws an error
df %>%
  mutate(gender = as.factor(gender)) %>%
  fct_count(gender) # Error: `f` must be a factor (or character vector).
# This works but doesn't use the nice dplyr select syntax
fct_count(df$gender)
# A tibble: 3 x 2
  f          n
  <fct>  <int>
1 Female     3
2 Male       2
3 Other      1

Where am I going wrong? New to dplyr and sorry for such a daft question but I can't seem to find a basic example anywhere!


Solution

  • fct_count takes a vector that is of type factor or char, it isn't especially aware of tibbles and dataframes. So the simplest pipe would be...

    library(dplyr)
    library(forcats)
    
    df %>%
       pull(gender) %>%
       fct_count 
    #> # A tibble: 3 x 2
    #>   f          n
    #>   <fct>  <int>
    #> 1 Female     3
    #> 2 Male       2
    #> 3 Other      1
    

    Your data

    pid <- c('id1','id2','id3','id4','id5','id6')
    gender <- c('Male','Female','Other','Male','Female','Female')
    df <- data.frame(pid, gender)
    df <- tibble::as_tibble(df)
    df