I need to read some capricious xlsx files in R. Converting them to csv messes up the column names, so I'm forced to use an alternative. I can read my files with the library readxl and the function read_xlsx().
My idea was to loop on each file to read to validate them and put them in a list. Then looping on each element of the list and do the work for each file.
However, read_xlsx() ouput is a tibble. I don't know anything about them, but they don't seem to appreciate being put in and extracted from a list.
For example, I use an excel file like this
this | is | a | test |
---|---|---|---|
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
With a code like this:
filenames <- list.files(path = "../Data")
dataList <- list()
for (filename in filenames) {
filepath <- paste0("../Data/", filename)
data <- read_xlsx(filepath)
print(data)
dataList[filename] <- data
print(dataList)
}
The output will be
# A tibble: 2 x 4
this is a test
<dbl> <dbl> <dbl> <dbl>
1 1 2 3 4
2 5 6 7 8
[[1]]
[1] 1 5
Why did I lose so much data? Is there any way to create a list of tibbles in order to manipulate each element of the list later?
For access a list item you need [[
not [
.
filenames <- list.files(path = "../Data")
dataList <- list()
for (filename in filenames) {
filepath <- paste0("../Data/", filename)
data <- read_xlsx(filepath)
dataList[[filename]] <- data
}
Another way to achieve this is foreach
package which would return a list.
library(foreach)
dataList <- foreach(filename = filenames,
.final = function(x) { setNames(x, filenames) }) %do% {
filepath <- paste0("../Data/", filename)
data <- read_xlsx(filepath)
data
}