Search code examples
java-8docx4j

How to Determine Merged Cells range for a given excel sheet using docx4j?


I wanted to read an excel sheet that has one merged column and I need to treat that column to determine the rows that are grouped with. So that I can consider that group as a set. Likewise each merged rows are considered as set1, set2, set3 and so on. I will then process each set w its rows separately.

enter image description here

UPDATE: As Requested by @fireandfuel, I am including the approche that I have arrived.

    CTMergeCells mergeCells = workSheet.getMergeCells();
    SheetData sheetData = workSheet.getSheetData();
    List<ArrayList<Row>> rowGroups = new ArrayList<ArrayList<Row>>();
    List<CTMergeCell> cTMergeCells = mergeCells.getMergeCell();
    for(CTMergeCell mcells : cTMergeCells){
        String range = mcells.getRef();
        Integer rowStart = Integer.parseInt(range.substring(1, 2));
        Integer rowEnd = Integer.parseInt(range.substring(4, 5));
        ArrayList<Row> rowss = (ArrayList<Row>) sheetData.getRow().stream().filter(row -> {
            return row.getR() >= rowStart.longValue() && row.getR() <= rowEnd.longValue();
        }).collect(Collectors.toList());
        rowGroups.add((ArrayList<Row>) rowss);
    }

Solution

  • It's very easy to access the definition of merged cells in SpreadsheetML using docx4j.

    Office Open XML files (docx, xlsx and pptx) are simply ZIP archives which are containing XML files. You can open it with the most archive programs and have a look inside its file structure and files.

    Here is some source code for to access the definition of merged cells using docx4j:

    // load yourFile.xlsx file
    SpreadsheetMLPackage spreadsheetMLPackage = SpreadsheetMLPackage.load(new File("yourFile.xlsx"));
    
    // get worksheet from /xl/worksheets/yourSheet.xml file from yourFile.xlsx
    WorksheetPart worksheetPart = (WorksheetPart) spreadsheetMLPackage.getParts().get(new PartName("/xl/worksheets/yourSheet.xml")); 
    Worksheet worksheet = worksheetPart.getJaxbElement();
    
    // get the merged cells
    CTMergeCells ctMergedCells = worksheet.getMergeCells();
    if(ctMergedCells != null){
        List<CTMergeCell> mergedCellList = ctMergedCells.getMergeCell();
        // do something with the merged cells
    }
    

    The class org.xlsx4j.sml.CTMergeCell has a function getRef which gives you the reference of cells which are merged as String, encoded as begin:end, like A1:B2 (merged cells from A1 to B2: A1, A2, B1 and B2).