Search code examples
rexceldataframexlsxtranspose

Reading horizontal (row-based) data from xlsx files into R data frames


This is a "let's try another way" post that is related to this one:

Is it possible to define/modify a reading function that

  • can handle the fact that data stored in an xlsx file is row-based (i.e. each row represents a variable)

  • and transforms it accordingly so it can be stored in a column-based data.frame (i.e. what used to be a row in xlsx becomes a column)

  • while capturing the underlying class/data type of the row-based variables?

Regarding csv files I would probably start with turning to readLines, but unfortunately xlsx is still a black box to me.

Here's a little xlsx file that features examples for both data orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx


Solution

  • What about slightly modifying the read.xlsx function from the xlsx package:

    library(xlsx)
    read.transposed.xlsx <- function(file,sheetIndex) {
            df <- read.xlsx(file, sheetIndex = sheetIndex , header = FALSE)
            dft <- as.data.frame(t(df[-1]), stringsAsFactors = FALSE) 
            names(dft) <- df[,1] 
            dft <- as.data.frame(lapply(dft,type.convert))
            return(dft)            
    }
    
    # Let's test it
    read.transposed.xlsx("row-and-column-based-data.xlsx", sheetIndex = 2)
    #    variable var_1 var_2 var_3
    #1 2016-01-01     1     a  TRUE
    #2 2016-01-02     2     b FALSE
    #3 2016-01-03     3     c  TRUE