Search code examples
rdplyrlevels

Cannot pipe variable to levels


I am working with a large data frame and rather than write manipulations to memory, I've been trying to do as much as a I with pipes. In trying to check my factor levels in intermediate steps, I ran into a problem using the levels function and wondered if anyone might know what the problem is.

An example:

library(dplyr)
Data <- data.frame(x = rep(LETTERS[1:5],3),
                   y = sample(1:10,length(x), replace=T))

The usual way works:

levels(Data$x)
[1] "A" "B" "C" "D" "E"

It mostly works if I use sapply:

 Data %>% select(x) %>% sapply(levels)
     x  
[1,] "A"
[2,] "B"
[3,] "C"
[4,] "D"
[5,] "E"

But piping does not work and returns NULL:

Data %>% select(x) %>% levels()
NULL

Why does Data %>% select(x) %>% levels() return NULL?

Is there a way to use levels with piped data?


Solution

  • select gives a data frame, but levels expects a vector as argument, that's why they don't work together; To use levels with pipe:

    You can either use .$x to extract the column in the levels method:

    Data %>% select(x) %>% {levels(.$x)}
    # [1] "A" "B" "C" "D" "E"
    

    Or a better approach use pull instead of select, pull gives the column as a vector/factor:

    Data %>% pull(x) %>% levels()
    # [1] "A" "B" "C" "D" "E"