I am trying to update an existing Excel file (xlsx).
Constructors like XSSFWorkbook(java.io.File file)
and XSSFWorkbook(OPCPackage pkg)
open the file in a read only mode and don't allow modifications. So I have to use XSSFWorkbook(java.io.InputStream is)
and the memory footprint (JVM memory and Java Heap) is too high. I can't use VM args to set max memory size, as I have to respect the memory requirements of other programs running in parallel.
Using SXSSFWorkbook
was another solution but it is essentially a wrapper around XSSFWorkbook
. Same problems persist that were with XSSFWorkbook
.
I went to other posts at SO and couldn't find any relevant answer. Can anybody help me with this please? I am fine with storing temporary files.
Your question is unclear. In reality it is not "How to open XSSFWorkbook
without using InputStream
?" but "How to reduce the memory footprint of XSSFWorkbook
? ". And the answer to this question is: It can't be reduced, such as apache poi
is programmed now. Else the apache poi
developers would must be as stupid that they had programmed apache poi
especially to waste memory. They have not.
But there are too much abstraction levels used.
All is based on XML. But users of programming libraries do not want bothering with XML and at least not with that kind of XML which is split into multiple files in a ZIP archive which are linked together using special relation XML files. So on top of the XML there is ooxml-schemas
, a library which gets the XML of the single files into usable java
objects. And there is org.apache.poi.openxml4j.opc.*
to manage the relations.
To get all benefit of this, all the usable java
objects (workbook, sheets, rows, cells, drawings, tables, pivot-tables, charts, ...) and the relations of those must be in memory to be ready to use. Or temporary files would must be used to store them temporary after got them out of the *.xlsx
ZIP archive. Working directly in the ZIP file system is not an option in my opinion because of the behavior of changes in such type of filesystem.
But using temporary files is not provided by apache poi
. Only SXSSF
uses temporary files for worksheets, but only for writing new *.xlsx
files, not for reading and updating such files.
Additional, to be compatible with the binary BIFF
*.xls
file format as most as possible, there is another level added. This is the SS
and XSSF
level which provides the high level classes of apache poi
. And because of that there are now additional java
objects for workbook, sheets, rows, cells, drawings, tables, pivot-tables, charts, .. additional to the low level ooxml-schemas
classes in memory.
So ... memory full when it comes to big *.xlsx
files. ;-)
Solutions?
To be as memory friendly as possible, unzip the *.xlsx
ZIP archive and do working directly with the XML in it. Of course this is very laborious, especially for creating new contents and when it comes to taking the relations into account. I have shown simple examples for this in my answers here. For example: How to modify a large Excel file when memory is an issue and How to set cell background color in excel using java + poi.
Or do programming an extension for apache poi
which uses temporary files instead of holding all in memory. Of course this also is laborious and has disadvantages for systems which do not provide using temporary files.