Search code examples
rreadr

Use a string to assign multiple column types to tibble


The following data has six columns. I want to change all their column types, respectively to factor-factor-factor-int-int-factor.

d <- structure(list(a = c(9, 9, 9, 9, 9, 9, 9), b = structure(c(2018, 2018, 2018, 2018, 2018, 2018, 2018), class = "yearmon"), c = c("605417CA", "605417CB", "606822AS", "606822AT", "606822AU", "606822AV", "60683MAB"), d = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), e = c(0, 0, 0, 0, 0, 0, 0), f = c(2772, 2772, 46367, 46367, 46367, 46367, 47601)), row.names = c(NA, -7L), class = c("tbl_df",  "tbl", "data.frame"))

If I was reading this data from an external file, I would use vroom(path, col_types = "fffiif"), and it automatically converts each variable in a string. But here the data is the result of previous computation, so I need to do the conversion myself. Is there a way to change all column types with a simple string, like vroom does?

Things I tried:

  • Using mutate for each of six variables is quite long.
  • Conversions are not one-to-one. For example, "a" and "e" are double, but I want to convert them to factor and int respectively. So mutate_if would not work.
  • The magrittr package has set_colnames, to change colum names by passing a vector of strings. There may be something similar to change column types, but I haven't found anything.
  • readr::type_convert seems to only apply to columns of type character.

I saved the data locally and imported it with vroom(path, col_types = "fffiif"), which works perfectly. So the question is to what function I can pass the string fffiif to do the conversion once I already have the data.


Solution

  • Either use the forloop from the linked post or mutate with across to avoid repetition:

    library(dplyr)
    
    d %>% 
      mutate(across(c(a:c, f), ~ as.factor(.x)),
             across(d:e, ~ as.integer(.x)))
    
    # # A tibble: 7 × 6
    #   a     b     c            d     e f    
    #   <fct> <fct> <fct>    <int> <int> <fct>
    # 1 9     2018  605417CA    NA     0 2772 
    # 2 9     2018  605417CB    NA     0 2772 
    # 3 9     2018  606822AS    NA     0 46367
    # 4 9     2018  606822AT    NA     0 46367
    # 5 9     2018  606822AU    NA     0 46367
    # 6 9     2018  606822AV    NA     0 46367
    # 7 9     2018  60683MAB    NA     0 47601
    

    Similar to the linked post, using lapply:

    ff <- list(f = as.factor, i = as.integer)
    cc <- unlist(strsplit("fffiif", ""))
    
    d[] <- lapply(seq_along(d), \(i) ff[[ cc[ i ] ]](d[[ i ]]))
    
    sapply(d, class)
    #       a         b         c         d         e         f 
    # "factor"  "factor"  "factor" "integer" "integer"  "factor"