Search code examples
rlisttibblereadxl

How to create a list of tibbles (from xlsx files) in R without losing data?


I need to read some capricious xlsx files in R. Converting them to csv messes up the column names, so I'm forced to use an alternative. I can read my files with the library readxl and the function read_xlsx().

My idea was to loop on each file to read to validate them and put them in a list. Then looping on each element of the list and do the work for each file.

However, read_xlsx() ouput is a tibble. I don't know anything about them, but they don't seem to appreciate being put in and extracted from a list.

For example, I use an excel file like this

this is a test
1 2 3 4
5 6 7 8

With a code like this:

filenames <- list.files(path = "../Data")
dataList <- list()

for (filename in filenames) {
  filepath <- paste0("../Data/", filename)
  data <- read_xlsx(filepath)
  print(data)
  dataList[filename] <- data
  print(dataList)
}

The output will be

# A tibble: 2 x 4
   this    is     a  test
  <dbl> <dbl> <dbl> <dbl>
1     1     2     3     4
2     5     6     7     8

[[1]]
[1] 1 5

Why did I lose so much data? Is there any way to create a list of tibbles in order to manipulate each element of the list later?


Solution

  • For access a list item you need [[ not [.

    filenames <- list.files(path = "../Data")
    dataList <- list()
    
    for (filename in filenames) {
      filepath <- paste0("../Data/", filename)
      data <- read_xlsx(filepath)
      dataList[[filename]] <- data
    }
    

    Another way to achieve this is foreach package which would return a list.

    library(foreach)
    
    dataList <- foreach(filename = filenames, 
                        .final = function(x) { setNames(x, filenames) }) %do% {
      filepath <- paste0("../Data/", filename)
      data <- read_xlsx(filepath)
      data
    }