Search code examples
javaeclipsedocx4jimport-from-excel

Printing contents of xlsx sheet


I'm using docx4j in eclipse to get the contents of an excel sheet but all what I'm getting is numbers. For simplicity, assume this is my sheet:

| asd | sd |

| hgn |

The code I'm using to load the contents is:

public static void load(String outputfilepath) throws FileNotFoundException{
    try {
        SpreadsheetMLPackage exc = SpreadsheetMLPackage
                .load(new java.io.File(outputfilepath));
        WorksheetPart sheet = exc.getWorkbookPart().getWorksheet(0);
        System.out.println(sheet.getPartName().getName());
        Worksheet ws = sheet.getJaxbElement();
        SheetData data = ws.getSheetData();
        int ic = 0;
        for (Row r : data.getRow()) {
            System.out.println("row " + ic);
            int ir = 0;
            for (Cell c : r.getC()) {
                System.out.println("cell " + ir + " contains "
                        + c.getV().toString());
                ir++;
            }
            ic++;
        }
        System.out.println("\ndone");
    } catch (Docx4JException e) {
        e.printStackTrace();
    } catch (Xlsx4jException e) {
        e.printStackTrace();
    }
}

And this is my output:

/xl/worksheets/sheet1.xml
row 0
cell 0 contains 0
cell 1 contains 1
row 1
cell 0 contains 2

done

What should I do to get the actual content?

Note: The problem occurs only with strings. ie if there are numbers in a cell I get them without problems.


Solution

  • Within XLSX the text contents of the cells is not stored directly in the sheet XML. There is a sharedStrings.xml in the XLSX archive.

    Thats why docx4j does not read the text content with getV() but the index of this content in the sharedStrings.xml. So you have the index and can get the content from the sharedStrings.

    See example: https://github.com/plutext/docx4j/blob/master/src/samples/xlsx4j/org/xlsx4j/samples/PartsList.java

    ...
    for (Cell c : r.getC() ) {
     if (c.getT().equals(STCellType.S)) {
      System.out.println( "  " + c.getR() + " contains " + sharedStrings.getJaxbElement().getSi().get(Integer.parseInt(c.getV())).getT() );
     } else {
      // TODO: handle other cell types
      System.out.println( "  " + c.getR() + " contains " + c.getV() );
     }
    }
    ...
    

    Note you have to read first the RelationshipsPart to get the sharedStrings. This is done in that example within the public static void printInfo(Part p, StringBuilder sb, String indent) while traversing the relationships in public static void traverseRelationships.

    Numbers and formulas are directly stored in the sheet XML. So for those contents the getV() will get the content directly.