Search code examples
rdataframer-xlsx

How to use xlsx package to insert blank column in Excel


How can I insert a blank column into an existing Excel file while retaining the data in that column?

I have time series data across multiple columns (one year per column) with summary data at the end (Min, Max, Percentiles).

I want to use R to insert a column that contains a new year's data, but the code I have currently pastes on top of the summary data.

Code has been generalized below:

# Add necessary packages
library(rJava)
library(xlsx)
library(xlsxjars)

# Import data. Assume 1 column of data
df <- read.csv("file.csv", header = TRUE)

# Create a workbook using library(xlsx) function
workbook <- loadWorkbook("existing_workbook.xlsx")

# Import sheet names from above workbook
sheets <- getSheets(workbook)

# Add the created dataframe into the workbook.
# Assume the workbook has 100 existing columns and data is to be added to column 99
addDataFrame(df, sheets$correct_sheet, startColumn = 99, row.names = FALSE)

#Save the created workbook
saveWorkbook(workbook, "R_Output.xlsx")

What this ultimately does is overwrite the data in column 99, which I would prefer be shifted one column to the right.

Is this possible?


Solution

  • You can do this fairly simply. The idea is to make a copy of column 99 in column 100, then write df into column 99. Since you do not provide data, I make a small reproducible example. My existing_workbook.xlsx just has two columns. I will move column 2 to column 3, then add the data from file.csv into column 2. The updated database will have the new column "inserted" before column 2.

    First some data

    library(xlsx)
    
    ## Create some reproducible data for testing
    write.csv(1:150, "file.csv", row.names=FALSE) 
    df = data.frame(X = seq(2,300,2), Y=rep(LETTERS, length.out=150))
    wb <- xlsx::createWorkbook()
    sheet1 <- xlsx::createSheet(wb, sheetName='test')
    addDataFrame(df, sheet1, col.names=TRUE, row.names=FALSE)
    saveWorkbook(wb, 'existing_workbook.xlsx')
    

    Now we have the files file.csv and existing_workbook.xlsx like in your code. My code starts out like yours to load the data.

    ## Your input statements
    df <- read.csv("file.csv", header = TRUE)
    workbook <- loadWorkbook("existing_workbook.xlsx")
    sheets <- getSheets(workbook)
    
    # Import sheet names from above workbook
    sheets <- getSheets(workbook)
    

    Now I copy column 2 into column 3 to preserve it.

    ## Move a copy of column 2 to column 3
    columnToPreserve = readColumns(sheets$test, 2, 2, startRow=1)
    addDataFrame(columnToPreserve, sheets$test, 
        startColumn = 3, row.names = FALSE)
    

    Now we can write df into column 2 without losing the old column 2.

    ## Now we can write df into column 2
    addDataFrame(df, sheets$test, startColumn = 2, row.names = FALSE)
    
    ## Save elsewhere to make sure it was right
    saveWorkbook(workbook, 'updated_workbook.xlsx')
    

    If you open up updated_workbook.xlsx in excel, you should see the inserted column.