I'm using docx4j in eclipse to get the contents of an excel sheet but all what I'm getting is numbers. For simplicity, assume this is my sheet:
The code I'm using to load the contents is:
public static void load(String outputfilepath) throws FileNotFoundException{
try {
SpreadsheetMLPackage exc = SpreadsheetMLPackage
.load(new java.io.File(outputfilepath));
WorksheetPart sheet = exc.getWorkbookPart().getWorksheet(0);
System.out.println(sheet.getPartName().getName());
Worksheet ws = sheet.getJaxbElement();
SheetData data = ws.getSheetData();
int ic = 0;
for (Row r : data.getRow()) {
System.out.println("row " + ic);
int ir = 0;
for (Cell c : r.getC()) {
System.out.println("cell " + ir + " contains "
+ c.getV().toString());
ir++;
}
ic++;
}
System.out.println("\ndone");
} catch (Docx4JException e) {
e.printStackTrace();
} catch (Xlsx4jException e) {
e.printStackTrace();
}
}
And this is my output:
/xl/worksheets/sheet1.xml
row 0
cell 0 contains 0
cell 1 contains 1
row 1
cell 0 contains 2
done
What should I do to get the actual content?
Note: The problem occurs only with strings. ie if there are numbers in a cell I get them without problems.
Within XLSX the text contents of the cells is not stored directly in the sheet XML. There is a sharedStrings.xml
in the XLSX archive.
Thats why docx4j
does not read the text content with getV()
but the index of this content in the sharedStrings.xml
. So you have the index and can get the content from the sharedStrings.
See example: https://github.com/plutext/docx4j/blob/master/src/samples/xlsx4j/org/xlsx4j/samples/PartsList.java
...
for (Cell c : r.getC() ) {
if (c.getT().equals(STCellType.S)) {
System.out.println( " " + c.getR() + " contains " + sharedStrings.getJaxbElement().getSi().get(Integer.parseInt(c.getV())).getT() );
} else {
// TODO: handle other cell types
System.out.println( " " + c.getR() + " contains " + c.getV() );
}
}
...
Note you have to read first the RelationshipsPart
to get the sharedStrings
. This is done in that example within the public static void printInfo(Part p, StringBuilder sb, String indent)
while traversing the relationships in public static void traverseRelationships
.
Numbers and formulas are directly stored in the sheet XML. So for those contents the getV()
will get the content directly.