Search code examples
apache-poisax

POI sax get Date from Cell


I´m reading excel file with sax and everything is working expect dates. Dates are read like "41751" which is number of days from 1900. There should be some formatString, which I can use for formating, but it is null.

StartElement()

this.nextDataType = xssfDataType.NUMBER;
        this.formatIndex = -1;
        this.formatString = null;
        String cellType = attributes.getValue("t");
        String cellStyleStr = attributes.getValue("s");
        if ("b".equals(cellType))
            nextDataType = xssfDataType.BOOL;
        else if ("e".equals(cellType))
            nextDataType = xssfDataType.ERROR;
        else if ("inlineStr".equals(cellType))
            nextDataType = xssfDataType.INLINESTR;
        else if ("s".equals(cellType))
            nextDataType = xssfDataType.SSTINDEX;
        else if ("str".equals(cellType))
            nextDataType = xssfDataType.FORMULA;
        else if (cellType == null) {
            maybeNull=true;
        } else if (cellStyleStr != null) {
            // It's a number, but almost certainly one
            //  with a special style or format
            int styleIndex = Integer.parseInt(cellStyleStr);
            XSSFCellStyle style = stylesTable.getStyleAt(styleIndex);
            this.formatIndex = style.getDataFormat();
            this.formatString = style.getDataFormatString();
            if (this.formatString == null)
                this.formatString = BuiltinFormats.getBuiltinFormat(this.formatIndex);
        }

EndElement()

case NUMBER:
                    String n = value.toString();
                    if (this.formatString != null) {
                        thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
                        try {
                            int intV = Integer.valueOf(thisStr);
                            cv.set(CellTypes.INT, intV);
                        } catch (NumberFormatException ignored) {}
                        if (cv.getType() == null) {
                            try {
                                Double doubleV = Double.valueOf(thisStr);
                                cv.set(CellTypes.DOUBLE, doubleV);
                            } catch (NumberFormatException e) {
                                cv.set(CellTypes.STRING, thisStr);
                            }
                        }
                    }

How can I get that date or how can I find, that cell contains date and not some integer. Full handler code is here http://pastebin.com/GveeMf4n Thanks


Solution

  • If you look at the Apache POI Example for converting a .xlsx file to .csv with SAX event parsing, you'll see what you need to do

    Basically, when you read the cell, not only do you need to capture the type and contents, but you also need to capture the s style index attribute, then look that up from the StylesTable which you load in a DOM-like way (it's small), eg

    public void startElement(String uri, String localName, String name,
                                 Attributes attributes) throws SAXException {
       this.formatIndex = -1;
       this.formatString = null;
       String cellStyleStr = attributes.getValue("s");
       if (cellStyleStr != null) {
           // It's a number, but almost certainly one
           //  with a special style or format 
           XSSFCellStyle style = null;
           if (cellStyleStr != null) {
               int styleIndex = Integer.parseInt(cellStyleStr);
               style = stylesTable.getStyleAt(styleIndex);
           } else if (stylesTable.getNumCellStyles() > 0) {
               style = stylesTable.getStyleAt(0);
           }
           if (style != null) {
               this.formatIndex = style.getDataFormat();
               this.formatString = style.getDataFormatString();
               if (this.formatString == null)
                   this.formatString = BuiltinFormats.getBuiltinFormat(this.formatIndex);
           }
        }
    
        // Rest of start element logic here
    

    Then, when you get to the end tag for the cell, you can use DataFormatter to format your cell's numeric contents (be that a Date, a %, or similar) based on the style you found earlier, eg

    public void endElement(String uri, String localName, String name)
                throws SAXException {
            String thisStr = null;
    
        // v => contents of a cell
        if ("v".equals(name)) {
            // Process the value contents as required.
            // Do now, as characters() may be called more than once
            switch (nextDataType) {
                    case NUMBER:
                        String n = value.toString();
                        if (this.formatString != null)
                            thisStr = formatter.formatRawCellContents(Double.parseDouble(n), this.formatIndex, this.formatString);
                        else
                            thisStr = n;
                        break;
    

    Basically though, tricks are to read the StylesTable, capture the format string when your read the cell start, then use that to format the numeric string with DataFormatter when you have the cell contents.

    Take a look at the source code to the XLSX2CSV example for a working implementation of that