Search code examples
vbams-officevstoopenxmlopenxml-sdk

Is there a way to tell that a Microsoft Office file is in the Open XML format?


Is there a way to programmatically tell that a Microsoft Office file is in the Open XML format and not the old pre-Office-97 binary format?

As an end user, I usually change the file extension to .zip and try extracting the archive. If it does extract, then I know that it is an Open XML format. But I am not even sure if that is a fool-proof method and that the files that it does not work for are not Open Xml files.

I did this in Powershell and from the output, because of the presence of the CustomXmlParts property, I am able to somewhat make a guess that this is an Open XML format workbook I am looking at.

PS C:\Users\Sathyaish> $excel = New-Object -ComObject Excel.Application
PS C:\Users\Sathyaish> $workbook = $excel.Workbooks.Open("C:\Sathyaish\temp\Foo.xlsx")
PS C:\Users\Sathyaish> $workbook | select C*, F*


Creator                  : 1480803660
ChangeHistoryDuration    : 0
Charts                   : System.__ComObject
CodeName                 :
CommandBars              :
Comments                 :
ConflictResolution       : 1
Container                :
CreateBackup             : False
CustomDocumentProperties : System.__ComObject
CustomViews              : System.__ComObject
CalculationVersion       : 171027
ContentTypeProperties    :
Connections              : System.__ComObject
CheckCompatibility       : False
CustomXMLParts           : System.__ComObject
ConnectionsDisabled      : False
CaseSensitive            : False
ChartDataPointTrack      : True
FileFormat               : 51
FullName                 : C:\Sathyaish\temp\Foo.xlsx
FullNameURLEncoded       : C:\Sathyaish\temp\Foo.xlsx
Final                    : False
ForceFullCalculation     : False

Is there a VSTO/VBA way to tell the same thing? Perhaps by looking at the value of the FileFormat property as listed above? What are the valid integer values and their meanings?

Or there must be the option of trying to load the document using the Open XML SDK and if it fails, then you know that it isn't a proper Open XML file format. But that doesn't exclude other possibilities such as the file not being a Microsoft Office file at all.


Solution

  • Excel's FileFormat is of type Microsoft.Office.Interop.Excel.XlFileFormat (each Office application has its own list of formats)

       //
        // Summary:
        //     Specifies a type of text format
        xlCurrentPlatformText = -4158,
        //
        // Summary:
        //     Excel workbook format.
        xlWorkbookNormal = -4143,
        //
        // Summary:
        //     Symbolic link format.
        xlSYLK = 2,
        //
        // Summary:
        //     Lotus 1-2-3 format.
        xlWKS = 4,
        //
        // Summary:
        //     Lotus 1-2-3 format.
        xlWK1 = 5,
        //
        // Summary:
        //     Comma separated value.
        xlCSV = 6,
        //
        // Summary:
        //     Dbase 2 format.
        xlDBF2 = 7,
        //
        // Summary:
        //     Dbase 3 format.
        xlDBF3 = 8,
        //
        // Summary:
        //     Data Interchange format.
        xlDIF = 9,
        //
        // Summary:
        //     Dbase 4 format.
        xlDBF4 = 11,
        //
        // Summary:
        //     Deprecated format.
        xlWJ2WD1 = 14,
        //
        // Summary:
        //     Lotus 1-2-3 format.
        xlWK3 = 15,
        //
        // Summary:
        //     Excel version 2.0.
        xlExcel2 = 16,
        //
        // Summary:
        //     Excel template format.
        xlTemplate = 17,
        //
        // Summary:
        //     Template 8
        xlTemplate8 = 17,
        //
        // Summary:
        //     Microsoft Office Excel Add-In.
        xlAddIn = 18,
        //
        // Summary:
        //     Microsoft Excel 97-2003 Add-In
        xlAddIn8 = 18,
        //
        // Summary:
        //     Specifies a type of text format.
        xlTextMac = 19,
        //
        // Summary:
        //     Specifies a type of text format.
        xlTextWindows = 20,
        //
        // Summary:
        //     Specifies a type of text format.
        xlTextMSDOS = 21,
        //
        // Summary:
        //     Comma separated value.
        xlCSVMac = 22,
        //
        // Summary:
        //     Comma separated value.
        xlCSVWindows = 23,
        //
        // Summary:
        //     Comma separated value.
        xlCSVMSDOS = 24,
        //
        // Summary:
        //     Deprecated format.
        xlIntlMacro = 25,
        //
        // Summary:
        //     Microsoft Office Excel Add-In international format.
        xlIntlAddIn = 26,
        //
        // Summary:
        //     Excel version 2.0 far east.
        xlExcel2FarEast = 27,
        //
        // Summary:
        //     Microsoft Works 2.0 format
        xlWorks2FarEast = 28,
        //
        // Summary:
        //     Excel version 3.0.
        xlExcel3 = 29,
        //
        // Summary:
        //     Lotus 1-2-3 format.
        xlWK1FMT = 30,
        //
        // Summary:
        //     Lotus 1-2-3 format.
        xlWK1ALL = 31,
        //
        // Summary:
        //     Lotus 1-2-3 format.
        xlWK3FM3 = 32,
        //
        // Summary:
        //     Excel version 4.0.
        xlExcel4 = 33,
        //
        // Summary:
        //     Quattro Pro format.
        xlWQ1 = 34,
        //
        // Summary:
        //     Excel version 4.0. Workbook format.
        xlExcel4Workbook = 35,
        //
        // Summary:
        //     Specifies a type of text format.
        xlTextPrinter = 36,
        //
        // Summary:
        //     Lotus 1-2-3 format.
        xlWK4 = 38,
        //
        // Summary:
        //     Excel version 5.0.
        xlExcel5 = 39,
        //
        // Summary:
        //     Excel 95.
        xlExcel7 = 39,
        //
        // Summary:
        //     Deprecated format.
        xlWJ3 = 40,
        //
        // Summary:
        //     Deprecated format.
        xlWJ3FJ3 = 41,
        //
        // Summary:
        //     Specifies a type of text format.
        xlUnicodeText = 42,
        //
        // Summary:
        //     Excel version 95 and 97.
        xlExcel9795 = 43,
        //
        // Summary:
        //     Web page format.
        xlHtml = 44,
        //
        // Summary:
        //     MHT format.
        xlWebArchive = 45,
        //
        // Summary:
        //     Excel Spreadsheet format.
        xlXMLSpreadsheet = 46,
        //
        // Summary:
        //     Excel12
        xlExcel12 = 50,
        //
        // Summary:
        //     Open XML Workbook
        xlOpenXMLWorkbook = 51,
        //
        // Summary:
        //     Workbook default
        xlWorkbookDefault = 51,
        //
        // Summary:
        //     Open XML Workbook Macro Enabled
        xlOpenXMLWorkbookMacroEnabled = 52,
        //
        // Summary:
        //     Open XML Template Macro Enabled
        xlOpenXMLTemplateMacroEnabled = 53,
        //
        // Summary:
        //     Open XML Template
        xlOpenXMLTemplate = 54,
        //
        // Summary:
        //     Open XML Add-In
        xlOpenXMLAddIn = 55,
        //
        // Summary:
        //     Excel8
        xlExcel8 = 56,
        //
        // Summary:
        //     OpenDocument Spreadsheet
        xlOpenDocumentSpreadsheet = 60
    

    You should be able to cast that integer value into an XlFileFormat object in PowerShell but off the top of my head I do not know exactly how. It should be some variation on this (example is C#):

    Enum.Parse(typeof(Microsoft.Office.Interop.Excel.XlFileFormat), fileFormat)