Search code examples
rlistdataframexlsx

Use of XLSX and addDataFrame() to get proper header names


List of data frames exported to Excel has DF name prepended

I am splitting a dataframe by city and the resulting list of dataframes export to their own Excel sheets, but the header of each dataframe is undesirably coming out as Atlanta.type and Atlanta.kpi instead of type and kpi. I'd like to understand why this is happening and how I can fix it to meet my need. My desired output should look like this (but it doesn't):

Good header AND Good data!!!

Detailed Example

Setup

Here's a basic dataframe similar to the one in real life, and I'll split it by location into a list:

library(tidyverse)

df_Main <- expand_grid(location = c("New York", "Boston", "Atlanta"),
                  type = c("A"), 
                  kpi = c(1.5,3))

> df_Main
# A tibble: 6 x 3
  location type    kpi
  <chr>    <chr> <dbl>
1 New York A       1.5
2 New York A       3  
3 Boston   A       1.5
4 Boston   A       3  
5 Atlanta  A       1.5
6 Atlanta  A       3  

Now split it into a list of dataframes, one for each sheet I'll create:

lst_Main <- split(df_Main, f = df_Main$location)

> lst_Main
$Atlanta
# A tibble: 2 x 3
  location type    kpi
  <chr>    <chr> <dbl>
1 Atlanta  A       1.5
2 Atlanta  A       3  

$Boston
# A tibble: 2 x 3
  location type    kpi
  <chr>    <chr> <dbl>
1 Boston   A       1.5
2 Boston   A       3  

$`New York`
# A tibble: 2 x 3
  location type    kpi
  <chr>    <chr> <dbl>
1 New York A       1.5
2 New York A       3  

Trial 1

I put the list into a workbook using library(xlsx), and while this does work, the column names in the resulting Excel file show the Atlanta.type and Atlanta.kpi column names:

library(xlsx)

wb = createWorkbook()

for (df in names(lst_Main)) {
  sheet = createSheet(wb, df)
  addDataFrame(lst_Main[df], sheet = sheet, col.names = TRUE, row.names = FALSE)
}

saveWorkbook(wb, file = "./Excel Output/test.xlsx")

Bad header name -- do not want the prepended city!

Trial 2

I thought that perhaps it was because I didn't double-bracket the list, so I alternatively tried:

for (df in names(lst_Main)) {
  sheet = createSheet(wb, df)
  addDataFrame(lst_Main[[df]], sheet = sheet, col.names = TRUE, row.names = FALSE)
}

This fixed the headers in the way I wanted them, the data isn't useful:

Good header, but bad data format!

Trial 3

As a last resort, I tried just adding a single straight dataframe to a workbook, and this turned out to export the good headers with the bad data, so perhaps my use of lists isn't the problem (?):

wb = createWorkbook()
sheet = createSheet(wb, "df_Main")
addDataFrame(df_Main, sheet = sheet, col.names = TRUE, row.names = FALSE)

Just add a simple dataframe -- but still wrong!

What I'm looking for

Based on Trial 3, I don't believe my use of lists is the problem (but maybe?). Primarily, I just want to know what is causing me to get the wrong output, is it my syntax or is my use of library(xlsx) just problematic (I know it uses java somehow). If the library(xlsx) package is the problem, what is a more reliable way to achieve these ends?


Solution

  • Consider wrapping with unname

    for (df in names(lst_Main)) {
       sheet = createSheet(wb, df)
       addDataFrame(unname(lst_Main[df]), sheet = sheet, col.names = TRUE, row.names = FALSE)
     }
    

    -output

    enter image description here


    This also works with single file, if it is wrapped in a list

    wb = createWorkbook()
    sheet = createSheet(wb, "df_Main")
    addDataFrame(list(df_Main), sheet = sheet, col.names = TRUE, row.names = FALSE)
    

    -output

    enter image description here


    The main reason OP's single data/multiple datasets are not working is because they are not data.frame, but tibble. Consider changing it to data.frame and it will work

    df_Main <- as.data.frame(df_Main)
    wb = createWorkbook()
    sheet = createSheet(wb, "df_Main")
    addDataFrame(df_Main, sheet = sheet, col.names = TRUE, row.names = FALSE)
    

    -output

    enter image description here


    As part of negative check, reconvert back to tibble and do this again

    df_Main <- as_tibble(df_Main)
    wb = createWorkbook()
    sheet = createSheet(wb, "df_Main")
    addDataFrame(df_Main, sheet = sheet, col.names = TRUE, row.names = FALSE)
    

    -output enter image description here


    The function itself says addDataFrame and the expected input 'x' according to ?addDataFrame is

    x - a data.frame.