Search code examples
rtidyrpurrrreadxl

How to read multiple files to save and count number of variables in each using a map_* function from purrr?


I read multiple files using map_dfr() and nest the data using the following codes:

library("tidyverse")
library("readxl")

files <- c("file1.xlsx", "file2.xlsx", "file3.xlsx", "file4.xlsx")

df <- files %>% 
  set_names() %>% 
  map_dfr(read_excel, .id = "File") %>% 
  group_by(File) %>% 
  nest()

And the dataframe df looks like this:

enter image description here

How can I use mutate() to add one column to save variable names in each data and another column to count how many variables? I tried this but it is not working:

df2 <- df %>%
  mutate(var.list = names(data)) %>% 
  mutate(var.n = unique(var.list))

My intention is to have a list column var.list to save variables (column names) from each data set and a column var.n to count the number of variables. Thank you for your help.


Solution

  • Fake setup:

    library(dplyr)
    library(purrr)
    
    set.seed(42)
    df <- tibble(file = sprintf("file%i.xlsx", 1:3)) %>%
      mutate(data = map(file, ~ mtcars[,sample(11,size=7)]))
    df
    # # A tibble: 3 x 2
    #   file       data             
    #   <chr>      <list>           
    # 1 file1.xlsx <df[,7] [32 x 7]>
    # 2 file2.xlsx <df[,7] [32 x 7]>
    # 3 file3.xlsx <df[,7] [32 x 7]>
    

    The work:

    df %>%
      mutate(
        var.list = map(data, colnames),
        var.n = map_int(var.list, ~ length(unique(.)))
      ) %>%
      # and just to show the differencs
      mutate(
        var.names = map_chr(var.list, toString)
      )
    # # A tibble: 3 x 5
    #   file       data              var.list  var.n var.names                          
    #   <chr>      <list>            <list>    <int> <chr>                              
    # 1 file1.xlsx <df[,7] [32 x 7]> <chr [7]>     7 mpg, drat, carb, am, cyl, hp, qsec 
    # 2 file2.xlsx <df[,7] [32 x 7]> <chr [7]>     7 gear, mpg, vs, qsec, hp, carb, drat
    # 3 file3.xlsx <df[,7] [32 x 7]> <chr [7]>     7 hp, gear, cyl, qsec, disp, mpg, wt