Search code examples
rdplyrtibblereadxlr-xlsx

Reading xlsx with multiple sheets in R for duplication removal


I have a excel file which has multiple sheets embedded in it. My main goal is to basically remove all rows which are appearing multiple times in a single sheet and have to do this for every sheet. I have written the code below but the code is only reading the first sheet and also giving ' ...' in first row and column. Can someone help me out where I might be going wrong. Thank you in advanced

**config_file_name <- '/RBIAPI3tables.xlsx'
config_xl <- paste(currentPath,config_file_name,sep="")
config_xl_sheets_name <- excel_sheets(path = config_xl) # An array of sheets is created. To access the array use config_xl_sheets[1] 
count_of_xl_sheets <- length(config_xl_sheets_name) 
# Read all sheets in the file as separate lists
list_all_sheets <- lapply(config_xl_sheets_name, function(x) read_excel(path = config_xl, sheet = x))
names (list_all_sheets) <- config_xl_sheets_name # Change the name of all the lists to excel file sheets name
count_of_list_all_sheets <- length(list_all_sheets) # to get the data frame of each list use list_all_sheets[[Config]]
# Create data frame for each sheet Assign the sheet name to the data frame
for (i in 1:count_of_list_all_sheets)
{
  assign(x= trimws(config_xl_sheets_name[i]), value = data.frame(list_all_sheets[[i]]))
  updateddata = unique(list_all_sheets[[i]])
}
write.xlsx(updateddata,"Unique3tables.xlsx",showNA = FALSE)**

Solution

  • this is my approach

    library(readxl)
    library(data.table)
    library(openxlsx)
    file.to.read   <- "./testdata.xlsx"
    sheets.to.read <- readxl::excel_sheets(file.to.read)
    
    # read sheets from the file to a list and remove duplicate rows
    L <- lapply(sheets.to.read, function(x) {
      data <- setDT(readxl::read_excel(file.to.read, sheet = x))
      #remove puplicates
      data[!duplicated(data), ]
      })
    
    # create a new workbook
    wb <- createWorkbook()
    # create new worksheets an write to them
    for (i in seq.int(L)) {
      addWorksheet(wb, sheets.to.read[i])
      writeData(wb, i, L[[i]] )
    }
    # write the workbook to disk
    saveWorkbook(wb, "testdata_new.xlsx")