Search code examples
rxlsx

Renaming multiple columns in different data frames combined into one object


I've 65 excel files stored in a folder (i.e. JanuaryFiles). I need to:

  1. import every excel file, 2) rename the columns with specific names (all the 65 files should have the same column names), 3) combine the 65 files using any function (e.g., rbind()) into one file, and 4) names it with today's date.

I used:

mydir = setwd("~/Dropbox/JanuaryFiles")
myfiles = list.files(path=mydir, pattern="*.xlsx", full.names=TRUE)
myfiles

# Here: I need to rename the columns of each excel file:
# Then I used the following to combine the files:

combined.df <- do.call(cbind , dat_All) 

# I'm also still missing the function that names the final file with specific name and today's date.

Solution

  • The lapply function will read all of your dataframes into a list object called dfs. The col_names argument is where you provide the names of your columns. skip is to ignore the first row, which has the wrong header names (delete this if you want to include the first row of your excel files).

    dplyr::bind_rows will stack the list of dataframes into one tibble object.

    sprintf("%s.xlsx", Sys.Date()) creates the file name using today's date. You can modify the output format using the format function (e.g. format(Sys.Date(), "%m-%d-%Y"). Then xlsx::write.xlsx outputs the dataframe. Note: it must be a dataframe not a tibble object which is why I used as.data.frame.

    library(dplyr)
    library(xlsx)
    library(readxl)
    
    # Provide a character vector of column names you want to col_names
    dfs <- lapply(myfiles, readxl::read_excel, col_names = cols, skip = 1)
    df <- dplyr::bind_rows(dfs)
    
    xlsx::write.xlsx(as.data.frame(df), sprintf("%s.xlsx", Sys.Date()))