As title, I have a dataset with about 13000 rows and 255 columns (actually I have more than 255 columns but RODBC
package seems to limit the number of columns exported to 255, so I trimmed it a bit) that need to be exported to xls/xlsx file.
I tried RODBC
and xlsx
package, both takes more than 5 minutes for export. I wonder if there is any other more efficient way of doing this?
I knew a little bit of python (using python to connect to outlook for listing emails in mailbox), if there is way for export using python instead, it is welcomed also.
Quite a few suggested using csv, it may not very possible in my case because there is a field containing free text that I cannot control what kind of character is entered in that field, making selection of separator difficult.
thank you for the suggestions, but I found that the R packages are fine only if the dataframe is relatively small and it is even slow for dataframe with all columns being character. Any suggestions?
There are lots of options:
xlsx
with mutliple sheets (you've tried this and it's too slow, I know)write.csv
should be faster and it's readable by ExcelodbcConnectExcel2007
within RODBC
bigmemory
to help you manage the large dataframe, especially if you can make it into a sparse matrixXLConnect
which worked for this guy with the same problemRODBC
or RPostgreSQL
, etc and then make a connection to the DB within Excel. I do this a lot. Here's a related resource.Pandas
tab-delimited
text file and then import it to Excel: write.table (table,sep="\t",quote=FALSE,row.names=FALSE,file=file.name)
fread
RExcel
XLLoop
Finally, here's a nice little article on "A Million Ways to Connect R and Excel" which you may find useful, though I think I've actually given you more options than the article does.
I would start with the most simple solutions, like fread
, then work your way to the relatively more complicated solutions if you're still not getting the results you want.
Depending on the exact nature of your project, you might even benefit from parallelism or multicore processing. Those don't boost your I/O speed in most cases, but it could speed up any processing/transformation of your data which takes place in your process, thus making your overall data pipeline faster.
Python
is also very well-equipped to handle this problem, but there are so many solutions within R
, hopefully you won't need to resort to switching languages just to write out data. Still, you could try
XlsxWriter
in Constant Memory mode, or
Optimized Reader and Writer of the openpyxl
package
if you want to try a Python-based solution.