Search code examples
excelopenxmlopenxml-sdk

What indicates an Office Open XML Cell contains a Date/Time value?


I'm reading an .xlsx file using the Office Open XML SDK and am confused about reading Date/Time values. One of my spreadsheets has this markup (generated by Excel 2010)

<x:row r="2" spans="1:22" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A2" t="s">
    <x:v>56</x:v>
  </x:c>
  <x:c r="B2" t="s">
    <x:v>64</x:v>
  </x:c>
  .
  .
  .
  <x:c r="J2" s="9">
    <x:v>17145</x:v>
  </x:c>

Cell J2 has a date serial value in it and a style attribute s="9". However, the Office Open XML Specification says that 9 corresponds to a followed hyperlink. This is a screen shot from page 4,999 of ECMA-376, Second Edition, Part 1 - Fundamentals And Markup Language Reference.pdf.

alt text

The presetCellStyles.xml file included with the spec also refers to builtinId 9 as a followed hyperlink.

<followedHyperlink builtinId="9">

All of the styles in the spec are simply visual formatting styles, not number styles. Where are the number styles defined and how does one differentiate a style reference s="9" from indicating a cell formatting (visual) style vs a number style?

Obviously I'm looking in the wrong place to match styles on cells with their number formats. Where's the right place to find this information?


Solution

  • The s attribute references a style xf entry in styles.xml. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask).

    The style.xml file has elements like:

    <xf numFmtId="14" ... applyNumberFormat="1" />
    <xf numFmtId="1" ... applyNumberFormat="1" />
    

    These are the xf entries, which in turn give you a numFmtId that references the number format mask.

    You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element

    <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
        <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
            <numFmts count="3">
                <numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" /> 
                <numFmt numFmtId="165" formatCode="0.000" /> 
                <numFmt numFmtId="166" formatCode="#,##0.000" /> 
            </numFmts>
    

    The number format id may be here, or it may be one of the built-in formats. Number format codes (numFmtId) less than 164 are "built-in".

    The list that I have is incomplete:

    0 = 'General';
    1 = '0';
    2 = '0.00';
    3 = '#,##0';
    4 = '#,##0.00';
    
    9 = '0%';
    10 = '0.00%';
    11 = '0.00E+00';
    12 = '# ?/?';
    13 = '# ??/??';
    14 = 'mm-dd-yy';
    15 = 'd-mmm-yy';
    16 = 'd-mmm';
    17 = 'mmm-yy';
    18 = 'h:mm AM/PM';
    19 = 'h:mm:ss AM/PM';
    20 = 'h:mm';
    21 = 'h:mm:ss';
    22 = 'm/d/yy h:mm';
    
    37 = '#,##0 ;(#,##0)';
    38 = '#,##0 ;[Red](#,##0)';
    39 = '#,##0.00;(#,##0.00)';
    40 = '#,##0.00;[Red](#,##0.00)';
    
    44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)';
    45 = 'mm:ss';
    46 = '[h]:mm:ss';
    47 = 'mmss.0';
    48 = '##0.0E+0';
    49 = '@';
    
    27 = '[$-404]e/m/d';
    30 = 'm/d/yy';
    36 = '[$-404]e/m/d';
    50 = '[$-404]e/m/d';
    57 = '[$-404]e/m/d';
    
    59 = 't0';
    60 = 't0.00';
    61 = 't#,##0';
    62 = 't#,##0.00';
    67 = 't0%';
    68 = 't0.00%';
    69 = 't# ?/?';
    70 = 't# ??/??';
    

    The missing values are mainly related to east asian variant formats.