Search code examples
javaexcelapacheapache-poixssf

Getting specific data from an excel sheet JAVA


I am trying to get specific data from an excel sheet, The data is dynamic. It can be anything really. The column headers are the only things i can use for placeholders, but the column header positions can vary in the sheet.

For example i have a sheet like this :

|Name| Surname| Value|

|bar | poo | 5|

|baz | foo | 7|

But for example i need to traverse the sheet to get the surname column and then if i find surname = 'poo' i must then pull its corresponding value which in the sheet is in the next colum but this is dynamic. The surname and value column arent always next to each other, they can be in any position at the top. But if i find a specific 'thing' in the surname column i need to pull its value.

I have managed to traverse through the sheet and store all the data in a 2d array And display that data. from the research ive done , this isnt an efficient approach as traversing and storing large data from sheets can use alot of memory. Ive read that you can read through an excel sheet and instead of storing those values in an array you can write them immediately to another sheet, if they match a certain condition. EG: (pseudo) If(columnheader == surname && surname == foo )then get corresponding value, then write that value to a new sheet.

Okay so my questions are :

1.How do i achieve iterating through the sheet not storing it in an array and writing it straight to another sheet if it matches a condition?

2.From the code i have below, how do i achieve sorting through the data in the array and finding if surname = foo get its corresponding value?

Like i said the data in the sheet is dynamic except for the column headers, but there positions as headers are dynamic.

Sorry for the long post , any help will be greatly appreciated.

package demo.poi;


import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.math.BigDecimal;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;


public class test  {

    public static void main(String[] args) throws Exception {

        File excel = new File("test.xlsx");
        FileInputStream fis = new FileInputStream(excel);

        XSSFWorkbook wb = new XSSFWorkbook(fis);
        XSSFSheet ws = wb.getSheetAt(0);
        ws.setForceFormulaRecalculation(true);

        int rowNum = ws.getLastRowNum() + 1;
        int colNum = ws.getRow(0).getLastCellNum();
        int surnameHeaderIndex = -1, valueHeaderIndex = -1;

        //Read the headers first. Locate the ones you need
        XSSFRow rowHeader = ws.getRow(0);
        for (int j = 0; j < colNum; j++) {
            XSSFCell cell = rowHeader.getCell(j);
            String cellValue = cellToString(cell);
            if("SURNAME".equalsIgnoreCase(cellValue)) {
                surnameHeaderIndex = j;
            } else if("VALUE".equalsIgnoreCase(cellValue)) {
                valueHeaderIndex = j;
            }
        }

        if(surnameHeaderIndex == -1 || valueHeaderIndex == -1) {
            throw new Exception("Could not find header indexes\nSurname : " + surnameHeaderIndex + " | Value : " + valueHeaderIndex);
        }
        //createnew workbook
        XSSFWorkbook workbook = new XSSFWorkbook(); 		
		//Create a blank sheet
		XSSFSheet sheet = workbook.createSheet("data");
		
        for (int i = 1; i < rowNum; i++) {
            XSSFRow row = ws.getRow(i);
           row = sheet.createRow(rowNum++);
            String surname = cellToString(row.getCell(surnameHeaderIndex));
            String value = cellToString(row.getCell(valueHeaderIndex));
            int cellIndex = 0;
            row.createCell(cellIndex++).setCellValue(surname);
            row.createCell(cellIndex++).setCellValue(value);
            

            
        }
        FileOutputStream fos = new FileOutputStream(new File("test1.xlsx"));
        workbook.write(fos);
        fos.close();
    }

    public static String cellToString(XSSFCell cell) {

        int type;
        Object result = null;
        type = cell.getCellType();

        switch (type) {

        case XSSFCell.CELL_TYPE_NUMERIC:
            result = BigDecimal.valueOf(cell.getNumericCellValue())
                    .toPlainString();

            break;
        case XSSFCell.CELL_TYPE_STRING:
            result = cell.getStringCellValue();
            break;
        case XSSFCell.CELL_TYPE_BLANK:
            result = "";
            break;
        case XSSFCell.CELL_TYPE_FORMULA:
            result = cell.getCellFormula();
        }

        return result.toString();
    }
}


Solution

  • Something like this should be a good starting point. Basically you parse the first row, where the headers are located. You find the position of the headers you want and keep them. In this example there are only two headers (surname, value) that are needed so I just keep two variables. If there are more, then the solution would be to keep the position of those headers in a HashMap, where the key is the name of the header. After that an iteration of the rows begins. The program parses the values of the columns that are needed (row.getCell(index)). Now you have the values that you need, and only them. You can do whatever you want, you can print them or write a file or whatnot.

    Here is an example. The error handling is up to you. This is only an example.

    package POIParser;
    
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import java.math.BigDecimal;
    
    import org.apache.poi.xssf.usermodel.XSSFCell;
    import org.apache.poi.xssf.usermodel.XSSFRow;
    import org.apache.poi.xssf.usermodel.XSSFSheet;
    import org.apache.poi.xssf.usermodel.XSSFWorkbook;
    
    public class MainPoi {
    
        public static void main(String[] args) throws Exception {
    
            File excel = new File("test.xlsx");
            FileInputStream fis = new FileInputStream(excel);
    
            XSSFWorkbook wb = new XSSFWorkbook(fis);
            XSSFSheet ws = wb.getSheetAt(0);
            ws.setForceFormulaRecalculation(true);
    
            int rowNum = ws.getLastRowNum() + 1;
            int colNum = ws.getRow(0).getLastCellNum();
            int surnameHeaderIndex = -1, valueHeaderIndex = -1;
    
            // Read the headers first. Locate the ones you need
            XSSFRow rowHeader = ws.getRow(0);
            for (int j = 0; j < colNum; j++) {
                XSSFCell cell = rowHeader.getCell(j);
                String cellValue = cellToString(cell);
                if ("SURNAME".equalsIgnoreCase(cellValue)) {
                    surnameHeaderIndex = j;
                } else if ("VALUE".equalsIgnoreCase(cellValue)) {
                    valueHeaderIndex = j;
                }
            }
    
            if (surnameHeaderIndex == -1 || valueHeaderIndex == -1) {
                throw new Exception("Could not find header indexes\nSurname : "
                        + surnameHeaderIndex + " | Value : " + valueHeaderIndex);
            }
            // createnew workbook
            XSSFWorkbook workbook = new XSSFWorkbook();
            // Create a blank sheet
            XSSFSheet sheet = workbook.createSheet("data");
    
            for (int i = 1; i < rowNum; i++) {
                XSSFRow row = ws.getRow(i);
                String surname = cellToString(row.getCell(surnameHeaderIndex));
                String value = cellToString(row.getCell(valueHeaderIndex));
                int cellIndex = 0;
                //Create a newRow object for the output excel. 
                //We begin for i = 1, because of the headers from the input excel, so we go minus 1 in the new (no headers).
                //If for the output we need headers, add them outside this for loop, and go with i, not i-1
                XSSFRow newRow = sheet.createRow(i-1);  
                newRow.createCell(cellIndex++).setCellValue(surname);
                newRow.createCell(cellIndex++).setCellValue(value);
            }
    
            FileOutputStream fos = new FileOutputStream(new File("test1.xlsx"));
            workbook.write(fos);
            fos.close();
        }
    
        public static String cellToString(XSSFCell cell) {
    
            int type;
            Object result = null;
            type = cell.getCellType();
    
            switch (type) {
    
            case XSSFCell.CELL_TYPE_NUMERIC:
                result = BigDecimal.valueOf(cell.getNumericCellValue())
                        .toPlainString();
    
                break;
            case XSSFCell.CELL_TYPE_STRING:
                result = cell.getStringCellValue();
                break;
            case XSSFCell.CELL_TYPE_BLANK:
                result = "";
                break;
            case XSSFCell.CELL_TYPE_FORMULA:
                result = cell.getCellFormula();
            }
    
            return result.toString();
        }
    }