Search code examples
rxlsxr-xlsx

formatting gene names as text in xlsx r package


I am trying to format columns within an excel file in the correct manner. First of all, I am afraid that Excel pics some strange gene names as dates which happens often in science. So normally, when importing data into excel from a txt file, I select the gene names column and change the cell type from general to text to be on the save side. When I now create my excel sheet using the xlsx, I am afraid exactly this will happen. At the moment I tried to reproduce this behaviour but all columns get formatted as general but I would love to force specific columns to text.

Is this somehow possible?

df <- data.frame(a=c(1,2),
                 b=c('SEPT2', 'MARCH1'),
                 c=c('1,2', '1,4'),
                 d=c('1.2', '1.4'),
                 e=c('2-SEP', '1-MARCH'),
                 f=c('APR-1', 'DEC-1'))
wb <- xlsx::createWorkbook()
sheet1 <- xlsx::createSheet(wb, sheetName='test')
xlsx::addDataFrame(df, sheet1, 
                   col.names=TRUE, row.names=FALSE)
xlsx::saveWorkbook(wb, 'test.xlsx')

I would love to format column b, e and f as text.

EDIT

since I was asking in a comment, how I can find more formatting values for example for scientific notation, I found this here.

text_format = CellStyle(wb, dataFormat=DataFormat("@"))
scientific_format <- CellStyle(wb, dataFormat=DataFormat('0.00E+00'))

Solution

  • You just need to create the format specification and add it to addDataFrame with the colStyle argument.

    wb <- xlsx::createWorkbook()
    sheet1 <- xlsx::createSheet(wb, sheetName='test')
    
    ## Create the format specification
    TextFormat = CellStyle(wb, dataFormat=DataFormat("@"))
    FormatList = list('2'=TextFormat, '5'=TextFormat,'6' = TextFormat)
    
    xlsx::addDataFrame(df, sheet1, col.names=TRUE, colStyle=FormatList, 
        row.names=FALSE)
    xlsx::saveWorkbook(wb, 'test.xlsx')
    

    Note: "@" is the code for text format.