Search code examples
rencodingpipelinecategorical-datadummy-variable

R: Encoding categorical data using across()


I have a dataset with features of type character (not all are binary and one of them represents a region).

In order to avoid having to use the function several times, I was trying to use a pipeline and across() to identify all of the columns of character type and encode them with the function created.

encode_ordinal <- function(x, order = unique(x)) {
  x <- as.numeric(factor(x, levels = order, exclude = NULL))
  x
}

dataset <- dataset %>% 
  encode_ordinal(across(where(is.character)))

However, it seems that I am not using across() correctly as I get the error:

Error: across() must only be used inside dplyr verbs.

I wonder if I am overcomplicating myself and there is an easier way of achieving this, i.e., identifying all of the features of character type and encode them.


Solution

  • You should call across and encode_ordinal inside mutate, as illustrated in the following example:

    dataset <- tibble(x = 1:3, y = c('a', 'b', 'b'), z = c('A', 'A', 'B'))
    # # A tibble: 3 x 3
    #       x y     z    
    #   <int> <chr> <chr>
    # 1     1 a     A    
    # 2     2 b     A    
    # 3     3 b     B    
    
    dataset %>%
        mutate(across(where(is.character), encode_ordinal))
    # # A tibble: 3 x 3
    #       x     y     z
    #   <int> <dbl> <dbl>
    # 1     1     1     1
    # 2     2     2     1
    # 3     3     2     2