Search code examples
rreadxl

How to read in excel sheets into one data frame in R and skip certain lines


I'm trying to read in an excel file with multiple sheets using R and merge them all into one data frame , tag the sheet name to one column of the dataframe.

Then this time I got a problem that the excel sheets contain 1 extra row for unnecessary titles so I want to skip row 1.

Using read_excel in lapply, I naturally think of just adding skip=1 such as

mylist <-lapply(excel_sheets(path), read_excel(skip=1)

Then R complained about path, and if I keep adding path, it complains that read_excel is not a function. So I thought okay write that using function(x){}

And this totally screwed up. The list generated had a subtle error that I only figured out when I'm plotting the data : it copy and pasted the same sheet 1 multiple times and added the correct sheet names on duplicated data.

Sure I can manually deleting the 1st row, but I want to know where I made the mistake and how to fix it.

library(readxl)

#read in excel sheets
#but now I need to skip one line
path <- "/Users/xxx/file.xlsx"
sheetnames <- excel_sheets(path)
mylist <- lapply(excel_sheets(path), function(x){read_excel(path= path,col_names = TRUE,skip = 1)})

# name the dataframes
names(mylist) <- sheetnames

#use Map to bind all the elements of the list into a dataframe
my_list <- Map(cbind, mylist, Cluster = names(mylist))
df <- do.call("rbind", my_list)

Solution

  • In read_excel function you are not passing the sheet that you want to read which is present in sheetnames variable. Try the following :

    library(readxl)
    path <- "/Users/xxx/file.xlsx"
    sheetnames <- excel_sheets(path)
    mylist <- lapply(sheetnames, function(x) 
                     read_excel(path,x, col_names = TRUE,skip = 1))
    #col_names is TRUE by default so you can use this without anonymous function like
    #mylist <- lapply(sheetnames, read_excel, path = path, skip = 1)
    
    # name the dataframes
    names(mylist) <- sheetnames
    
    #use Map to bind all the elements of the list into a dataframe
    my_list <- Map(cbind, mylist, Cluster = names(mylist))
    df <- do.call("rbind", my_list)