This is a "let's try another way" post that is related to this one:
Is it possible to define/modify a reading function that
can handle the fact that data stored in an xlsx
file is row-based (i.e. each row represents a variable)
and transforms it accordingly so it can be stored in a column-based data.frame
(i.e. what used to be a row in xlsx
becomes a column)
while capturing the underlying class/data type of the row-based variables?
Regarding csv
files I would probably start with turning to readLines
, but unfortunately xlsx
is still a black box to me.
Here's a little xlsx
file that features examples for both data orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx
What about slightly modifying the read.xlsx
function from the xlsx
package:
library(xlsx)
read.transposed.xlsx <- function(file,sheetIndex) {
df <- read.xlsx(file, sheetIndex = sheetIndex , header = FALSE)
dft <- as.data.frame(t(df[-1]), stringsAsFactors = FALSE)
names(dft) <- df[,1]
dft <- as.data.frame(lapply(dft,type.convert))
return(dft)
}
# Let's test it
read.transposed.xlsx("row-and-column-based-data.xlsx", sheetIndex = 2)
# variable var_1 var_2 var_3
#1 2016-01-01 1 a TRUE
#2 2016-01-02 2 b FALSE
#3 2016-01-03 3 c TRUE