Search code examples
javaout-of-memoryapache-poixlsxxlsm

Parsing xlsx files as chunks via streaming/pagination strategy using apache poi


There is a case wherein xlsx,xlsm files having huge amount of data(in orders of 80-100MB) is causing memory heap out of space issues on servers using the load() method of Workbook object, which takes FileInputStream as parameter.

Its intended to load the data, validate the cell content and report error in case there is invalid record entry. If all data is correct then write it to the table.Hence, the following didn't suffice my purpose.

Error While Reading Large Excel Files (xlsx) Via Apache POI

The problem involves paginated parsing, data validating and then writing to database.


Solution

  • StAX parser is a good approach to this situation. https://docs.oracle.com/javase/tutorial/jaxp/stax/index.html

    We can iterate over the sheets to obtain index of value at each cell, and use SharedStringsTable object to get the value at particular cell location.