Search code examples
javaparsingapache-poisaxparserimport-from-excel

parse part of xlsx file using poi xssf and sax parser


I am trying to read 10 rows out of 1 million records in an excel sheet (.xlsx) :

            FileInputStream myFile = new FileInputStream(file);
            OPCPackage pkg = OPCPackage.open(myFile);
            XSSFReader reader = new XSSFReader(pkg);

            MySheetContentHandler sheetHandler = new MySheetContentHandler();

            StylesTable styles = reader.getStylesTable();
            ReadOnlySharedStringsTable sharedStrings = new ReadOnlySharedStringsTable(pkg);
            ContentHandler handler = new XSSFSheetXMLHandler(styles, sharedStrings, sheetHandler, true);

            XMLReader parser = XMLReaderFactory.createXMLReader();
            parser.setContentHandler(handler);

            parser.parse(new InputSource(reader.getSheetsData().next()));

            pkg.close();

And MySheetContentHandler:

@Override
public void cell(String cellReference, String formattedValue) {

    getRowList.add(formattedValue);
    if (count == 0) {
        columnsId.add(String.valueOf(columnCount++));
    }

}

@Override
public void endRow() {
    if (myNewList.size() < maxRecord) {
        myNewList.add(getRowList);
    }
}

@Override
public void headerFooter(String arg0, boolean arg1, String arg2) {

}

@Override
public void startRow(int rowNum) {
    getRowList = new ArrayList<String>();
    if (rowNum == 0) {
        count = 0;
    } else {
        count++;
    }
}

When calling method parser.parse(new InputSource(reader.getSheetsData().next())); the entire file is parsed and it takes time, all I want is to parse the first 10 rows only and then stop parsing.


Solution

  • I gone through the internet searching for this issue, the most common way I found is to throw a Runtime Exception within the content handler when your specific condition applies.

    For example:

    @Override
    public void startRow(int rowNum) {
        getRowList = new ArrayList<String>();
        if (rowNum == 0) {
            count = 0;
        } else {
            count++;
        }
    
        if(count > SOME_LIMIT)
        throw new MyParsingTerminationException();
    }
    

    Then, you should catch this exception around the parse method.

    try {
         parser.parse(new InputSource(reader.getSheetsData().next()));
    } catch (MyParsingTerminationException e) {
         // handle the termination
    }