Search code examples
rtidyversetranslation

Translating the following function using tidyverse verbs into base R as a function


I'm trying to translate the following syntax found in tidyverse into base R as a function, though I'm having difficulties following the same output.

Here's the syntax:

x <- function(x) {x %>% 
    select(where(negate(is.numeric))) %>% 
    map_dfc(~ model.matrix(~ .x -1) %>% 
              as_tibble) %>% 
    rename_all(~ str_remove(., "\\.x")) 
}

I understand that select can be represented as indexing within a dataframe such as x[,]. As for the pipe function %>%, I can just index a function within a variable i.e. x <- ...

I can manage to transfer select(where(negate(is.numeric)))

into:

x <- function(x){
  x[, !sapply(x, is.numeric)]
  
}

Though, this makes it difficult, as I'm thinking it can be replaced with a conditional argument:

 map_dfc(~ model.matrix(~ .x -1)

Here's the expected output with some example data:

# A tibble: 12 x 5
   black brown white female  male
   <dbl> <dbl> <dbl>  <dbl> <dbl>
 1     1     0     0      1     0
 2     1     0     0      1     0
 3     1     0     0      1     0
 4     1     0     0      1     0
 5     0     0     1      1     0
 6     0     0     1      1     0
 7     0     0     1      0     1
 8     0     0     1      0     1
 9     0     1     0      0     1
10     0     1     0      0     1
11     0     1     0      0     1
12     0     1     0      0     1

reproducible code:

structure(list(wgt = c(64L, 71L, 53L, 67L, 55L, 58L, 77L, 57L, 
56L, 51L, 76L, 68L), hgt = c(57L, 59L, 49L, 62L, 51L, 50L, 55L, 
48L, 42L, 42L, 61L, 57L), age = c(8L, 10L, 6L, 11L, 8L, 7L, 10L, 
9L, 10L, 6L, 12L, 9L), id = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 2L, 2L, 2L, 2L), .Label = c("black", "brown", "white"
), class = "factor"), sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("female", "male"), class = "factor")), class = "data.frame", row.names = c(NA, 
-12L))

Solution

  • Calling your input data xx,

    onehot = function(data) {
      x = Filter(Negate(is.numeric), data)
      x = as.data.frame(Reduce(cbind, lapply(x, function(col) model.matrix(~ . - 1, data = data.frame(col)))))
      setNames(x, sub(pattern = "^col", replacement = "", names(x)))
    }
    
    onehot(xx)
    #    black brown white female male
    # 1      1     0     0      1    0
    # 2      1     0     0      1    0
    # 3      1     0     0      1    0
    # 4      1     0     0      1    0
    # 5      0     0     1      1    0
    # 6      0     0     1      1    0
    # 7      0     0     1      0    1
    # 8      0     0     1      0    1
    # 9      0     1     0      0    1
    # 10     0     1     0      0    1
    # 11     0     1     0      0    1
    # 12     0     1     0      0    1
    

    There are other packages that do one-hot encoding like this, see here for some examples, but the above is all base.