Search code examples
rexcelxlsx

Read in Certain Sheets From Excel Workbook


I have a work book that has 10 sheets: Half has the word "Survey" in its sheet name and the other half has the word "Census" in its sheet name.

I want to read in all the sheets that has the word "Survey" in its sheet name and append them into a separate workbook.

Sheet names are: WA Survey, Survey MD, DC Survey, VA Survey, Survey GA.

What I have:

 library(XLConnect)
 wb <- loadWorkbook("Data.xls")
 dataMD <- readWorksheet(wb, sheet = "Survey MD", startCol = 1, endCol = 10)
 dataWA <- readWorksheet(wb, sheet = "Survey WA", startCol = 1, endCol = 10)
 dataDC <- readWorksheet(wb, sheet = "Survey DC", startCol = 1, endCol = 10)
 dataVA <- readWorksheet(wb, sheet = "Survey VA", startCol = 1, endCol = 10)
 dataGA <- readWorksheet(wb, sheet = "Survey GA", startCol = 1, endCol = 10)
 total <- merge(dataMD,dataWA,dataDC,dataVA,dataGA,by="ID")

How can I make this simpler? I was hoping I can complete this task in 3-4 lines?


Solution

  • You could use the readxl package like this and filter the Survey data sheets before reading everything in one go.

    library(readxl)
    
    # file including path if needed
    file <- "Data.xls" 
    # read the sheets and only keep the Survey sheets
    sheets <- excel_sheets(file)
    sheets <- sheets[grep("Survey", sheets)]
    
    # read the data, only first 10 columns (A:J)
    excel_data <- lapply(sheets, read_excel, path = file, range = cell_cols("A:J") )
    

    You end up with a list of data. If all columns are identical you can use do.call("rbind", excel_data) or dplyr::bind_rows