Search code examples
rexceltidyversepurrrreadxl

Importing multiple sheets from excel, transposing and creating new columns before joining with map_df and read_excel


I have previously used a function similar to this question/answer to read in multiple sheets and create a new id variable based on the sheet name:

Importing and Renaming multiple sheets in a excel worksbook with map_df and read_excel

file_name <- "example.xlsx"
sheets <- readxl::excel_sheets(file_name)
data <- map_df(sheets, ~mutate(read_excel(file_name, sheet = .x), 
                       id = .x))

However, I would now like to introduce additional steps to each sheet before joining sheets into a data frame. I would like to remove a row, transpose the sheet, and as the original column names in the excel files are times, create a new time variable. I can manage this with one sheet e.g.

time <-seq(ymd_hm('2023-04-20 12:00'),ymd_hm('2023-04- 11:50'), 
                 by = '10 mins')

file_name <- "example.xlsx"

data <- readxl::read_excel(file_name)%>%
  dplyr::slice(-3)%>%
  sjmisc::rotate_df(cn = T) %>%
  tibble()%>%
  mutate(time = time)) 

I have not been able to successfully combine the steps, i.e. import, transpose/format sheets, create id variable based on sheet name, and combine all sheets in a data frame. I would appreciate a tidyverse based solution/advise, and apologise for the lack of a reproducible example, as not sure how to do this when the question regards reading in data.


Solution

  • Not sure what's exactly the issue. But you can simply move your data wrangling code inside a function which can be applied to the list of sheets similar to your first code:

    Using some fake example data:

    library(tidyverse)
    
    file_name <- tempfile(".xlsx")
    
    dat <- data.frame(
      time = 2010:2014,
      x = 1:5,
      y = rev(1:5)
    )
    writexl::write_xlsx(list(a = dat, b = dat), file_name)
    
    sheets <- readxl::excel_sheets(file_name)
    
    data <- map_df(sheets, function(sheet) {
      readxl::read_excel(file_name, sheet = sheet) %>%
        dplyr::slice(-3) %>%
        #sjmisc::rotate_df(cn = T) %>%
        tibble() %>%
        mutate(time = time, id = sheet)
    })
    
    data
    #> # A tibble: 8 × 4
    #>    time     x     y id   
    #>   <dbl> <dbl> <dbl> <chr>
    #> 1  2010     1     5 a    
    #> 2  2011     2     4 a    
    #> 3  2013     4     2 a    
    #> 4  2014     5     1 a    
    #> 5  2010     1     5 b    
    #> 6  2011     2     4 b    
    #> 7  2013     4     2 b    
    #> 8  2014     5     1 b