Search code examples
pythonrexcelexport-to-excel

Efficient way of exporting large R dataset to excel


As title, I have a dataset with about 13000 rows and 255 columns (actually I have more than 255 columns but RODBC package seems to limit the number of columns exported to 255, so I trimmed it a bit) that need to be exported to xls/xlsx file.

I tried RODBC and xlsx package, both takes more than 5 minutes for export. I wonder if there is any other more efficient way of doing this?

I knew a little bit of python (using python to connect to outlook for listing emails in mailbox), if there is way for export using python instead, it is welcomed also.

update 01

Quite a few suggested using csv, it may not very possible in my case because there is a field containing free text that I cannot control what kind of character is entered in that field, making selection of separator difficult.

update 02

thank you for the suggestions, but I found that the R packages are fine only if the dataframe is relatively small and it is even slow for dataframe with all columns being character. Any suggestions?


Solution

  • There are lots of options:

    1. Use xlsx with mutliple sheets (you've tried this and it's too slow, I know)
    2. Use write.csv should be faster and it's readable by Excel
    3. Use odbcConnectExcel2007 within RODBC
    4. Use the package bigmemory to help you manage the large dataframe, especially if you can make it into a sparse matrix
    5. XLConnect which worked for this guy with the same problem
    6. Write it to a SQL datatabase with RODBC or RPostgreSQL, etc and then make a connection to the DB within Excel. I do this a lot. Here's a related resource.
    7. Use Pandas
    8. Create a tab-delimited text file and then import it to Excel: write.table (table,sep="\t",quote=FALSE,row.names=FALSE,file=file.name)
    9. Use fread
    10. Try a cloud-based solution (I'm not sure if this will actually be faster, but it would at least be a trendy solution with extra benefits such as providing a nice way to store your data safely and let you query whatever you need from it using Excel on any computer)
    11. RExcel
    12. XLLoop

    Finally, here's a nice little article on "A Million Ways to Connect R and Excel" which you may find useful, though I think I've actually given you more options than the article does.

    I would start with the most simple solutions, like fread, then work your way to the relatively more complicated solutions if you're still not getting the results you want.

    Depending on the exact nature of your project, you might even benefit from parallelism or multicore processing. Those don't boost your I/O speed in most cases, but it could speed up any processing/transformation of your data which takes place in your process, thus making your overall data pipeline faster.

    Python is also very well-equipped to handle this problem, but there are so many solutions within R, hopefully you won't need to resort to switching languages just to write out data. Still, you could try

    1. XlsxWriter in Constant Memory mode, or

    2. Optimized Reader and Writer of the openpyxl package

      if you want to try a Python-based solution.