Search code examples
rlapplyxlsxtailopenxlsx

I am trying to read only the tail of multiple .xlsx files merged into a data.frame of lists


I am trying to merge multiple .xlsx sheets together into one data file within r, but extracting only the last row of each sheet.

I am a clinical academic, and we current have a prediction algorithm implemented via a macro-enabled excel spreadsheet. This macro-enabled spreadsheet outputs a .xlsx sheet into a pre-specified folder.

It unfortunately has a series of test rows that it inserted into the output .xlsx . Furthermore the users occasionally input the same data multiple times until it is correct. For this reason in the cleaned data we would only like the final row of each .xlsx file to be included.

I have managed to merge all the files, using the below code, mainly due to the help/code I have managed to find from this community.

I am unfortunately stuck at the following error message. See below

library(plyr)
library(dplyr)
library(readxl)

#file directory where the .xlsx files are to be listed below path <- "//c:/documents" 
filenames_list <- list.files(path= path, full.names=TRUE)

All_list <- lapply (filenames_list, 
               function(filename){
  print(paste("Merging",filename,sep = " "))
  read.xlsx(filename)
})

#this below code doesnt work 
#it returns the following error
# Error in x[seq.int(to = xlen, length.out = n)] : 
#  object of type 'S4' is not subsettable 
tail_only_list_df <- lapply (All_list,  
    function(newtail){
       tail(newtail, 1)
    })

final_df <- rbind.fill(tail_only_list_df)

Solution

  • Try doing the following :

    df <- do.call(rbind, lapply(filenames_list, function(filename) 
                    tail(openxlsx::read.xlsx(filename), 1)))
    

    Or if you already have list of excel files do

    df <- do.call(rbind, lapply(All_list, tail, 1))