Search code examples
pythonxlsxxlrd

xlrd function to return the file type of a workbook object


The title says it all - is there a function in xlrd that returns the file type (MIME type, xls or xlsx, etc.) of the workbook that has just been opened with xlrd.open_workbook(fileName) ? I can't find one in the documentation.


Solution

  • Ok, after reading through the code of the __init__, book, and xlsx classes in xlrd on GitHub (https://github.com/python-excel/xlrd/), I see that there is no attribute of the Book object that returns the file type. The closest I can get is to use the log file and set verbosity to True:

    import xlrd
    
    def ReadSpreadsheet(filePath):
        myLog = open(''.join([filePath,'.log.txt']), 'w')
        myLog.write(''.join(['Opening ',filePath,'\n']))
        wBook = xlrd.open_workbook(filePath, logfile=myLog, verbosity=True)
        myLog.close()
    

    This function will write a log file that shows the components for each file. Testing with four files, it's very obvious from the log which of the files are recognized as xlsx files, which are recognized as xls files, and which are not recognized:

    Office 2010 xlsx file:

    >>> testing_xls.ReadSpreadsheet('MS.xlsx')
    
    Opening MS.xlsx
    ZIP component_names:
    ['[Content_Types].xml',
     '_rels/.rels',
     'xl/_rels/workbook.xml.rels',
     'xl/workbook.xml',
     'xl/sharedStrings.xml',
     'xl/worksheets/_rels/sheet1.xml.rels',
     'xl/theme/theme1.xml',
     'xl/styles.xml',
     'xl/worksheets/sheet1.xml',
     'docProps/core.xml',
     'xl/printerSettings/printerSettings1.bin',
     'docProps/app.xml']
    

    Office 2010 xls file:

    >>> testing_xls.ReadSpreadsheet('MS.xls')
    
    Opening MS.xls
    CODEPAGE: codepage 1200 -> encoding 'utf_16_le'
    DATEMODE: datemode 0
    Countries: (1, 1)
    
    Colour indexes used:
    []
    
    NOTE *** sheet 0 (u'Sheet1'): DIMENSIONS R,C = 26,9 should be 23,9
    

    LibreOffice 4.2 xlsx file

    >>> testing_xls.ReadSpreadsheet('Libre.xlsx')
    
    Opening Libre.xlsx
    ZIP component_names:
    [u'_rels/.rels',
     u'docProps/app.xml',
     u'docProps/core.xml',
     u'xl/_rels/workbook.xml.rels',
     u'xl/sharedStrings.xml',
     u'xl/worksheets/_rels/sheet1.xml.rels',
     u'xl/worksheets/sheet1.xml',
     u'xl/styles.xml',
     u'xl/workbook.xml',
     u'[Content_Types].xml']
    

    LibreOffice 4.2 ODS file

    >>> testing_xls.ReadSpreadsheet('Libre.ods')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "testing_xls.py", line 6, in ReadSpreadsheet
        wBook = xlrd.open_workbook(filePath, logfile=myLog, verbosity=True)
      File "/usr/local/lib/python2.7/dist-packages/xlrd/__init__.py", line 422, in open_workbook
        raise XLRDError('Openoffice.org ODS file; not supported')
    xlrd.biffh.XLRDError: Openoffice.org ODS file; not supported
    

    [Nothing written to log file.]

    I suppose I could catch the XLRDError and return ODS, or read the log file and return XLSX if component_names is found, and return XLS if codepage is found.