Search code examples
rpurrrreadr

Mapping specific datatypes to specific columns via lookup table


I'm working with large datasets where often it can be difficult to always know whether or not the numerical values should be handled as a continuous feature or if they represent a categorical value. Other times, R gets it wrong and assigns something to be a character when it is infact numerical.

I'm hoping to build a lookup table to map specific datatypes to specific columns - by name. Is there a way to do this with the purrr package or something similar?

For example:

mylookup_table =data.frame(column_names = c('mpg','vs','hp'), column_types = c('numeric','factor','character') )
#
 #apply to mtcars for just these columns..

Solution

  • I presume a purrr guru could show how to do this more elegantly, but seeing as there aren't too many data types in R, this isn't too bad:

    library(tidyverse)
    mylookup_table <- data_frame(
      column_names = c('mpg','vs','hp'), 
      column_types = c('numeric','factor','character'))
    
    mylookup_chars <- mylookup_table[mylookup_table$column_types == "character", 1]
    mylookup_nums <- mylookup_table[mylookup_table$column_types == "numeric", 1]
    mylookup_factors <- mylookup_table[mylookup_table$column_types == "factor", 1]
    
    
    mtcars %>%
      purrr::map_at(mylookup_chars$column_names, as.character) %>% 
      purrr::map_at(mylookup_nums$column_names, as.numeric) %>% 
      purrr::map_at(mylookup_factors$column_names, as.factor) %>% 
      str()
    
    List of 11
     $ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
     $ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
     $ disp: num [1:32] 160 160 108 258 360 ...
     $ hp  : chr [1:32] "110" "110" "93" "110" ...
     $ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
     $ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
     $ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
     $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
     $ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
     $ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
     $ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...