Search code examples
javaexcelapache-poixssf

Updating existing Excel file in Java Apache POI without using InputStream


I am trying to update an existing Excel file (xlsx).

Constructors like XSSFWorkbook(java.io.File file) and XSSFWorkbook(OPCPackage pkg) open the file in a read only mode and don't allow modifications. So I have to use XSSFWorkbook(java.io.InputStream is) and the memory footprint (JVM memory and Java Heap) is too high. I can't use VM args to set max memory size, as I have to respect the memory requirements of other programs running in parallel.

Using SXSSFWorkbook was another solution but it is essentially a wrapper around XSSFWorkbook. Same problems persist that were with XSSFWorkbook.

I went to other posts at SO and couldn't find any relevant answer. Can anybody help me with this please? I am fine with storing temporary files.


Solution

  • Your question is unclear. In reality it is not "How to open XSSFWorkbook without using InputStream?" but "How to reduce the memory footprint of XSSFWorkbook? ". And the answer to this question is: It can't be reduced, such as apache poi is programmed now. Else the apache poi developers would must be as stupid that they had programmed apache poi especially to waste memory. They have not.

    But there are too much abstraction levels used.

    All is based on XML. But users of programming libraries do not want bothering with XML and at least not with that kind of XML which is split into multiple files in a ZIP archive which are linked together using special relation XML files. So on top of the XML there is ooxml-schemas, a library which gets the XML of the single files into usable java objects. And there is org.apache.poi.openxml4j.opc.* to manage the relations.

    To get all benefit of this, all the usable java objects (workbook, sheets, rows, cells, drawings, tables, pivot-tables, charts, ...) and the relations of those must be in memory to be ready to use. Or temporary files would must be used to store them temporary after got them out of the *.xlsx ZIP archive. Working directly in the ZIP file system is not an option in my opinion because of the behavior of changes in such type of filesystem.

    But using temporary files is not provided by apache poi. Only SXSSF uses temporary files for worksheets, but only for writing new *.xlsx files, not for reading and updating such files.

    Additional, to be compatible with the binary BIFF *.xls file format as most as possible, there is another level added. This is the SS and XSSF level which provides the high level classes of apache poi. And because of that there are now additional java objects for workbook, sheets, rows, cells, drawings, tables, pivot-tables, charts, .. additional to the low level ooxml-schemas classes in memory.

    So ... memory full when it comes to big *.xlsx files. ;-)

    Solutions?

    To be as memory friendly as possible, unzip the *.xlsx ZIP archive and do working directly with the XML in it. Of course this is very laborious, especially for creating new contents and when it comes to taking the relations into account. I have shown simple examples for this in my answers here. For example: How to modify a large Excel file when memory is an issue and How to set cell background color in excel using java + poi.

    Or do programming an extension for apache poi which uses temporary files instead of holding all in memory. Of course this also is laborious and has disadvantages for systems which do not provide using temporary files.