The title says it all - is there a function in xlrd that returns the file type (MIME type, xls or xlsx, etc.) of the workbook that has just been opened with xlrd.open_workbook(fileName)
? I can't find one in the documentation.
Ok, after reading through the code of the __init__
, book
, and xlsx
classes in xlrd on GitHub (https://github.com/python-excel/xlrd/), I see that there is no attribute of the Book object that returns the file type. The closest I can get is to use the log file and set verbosity to True:
import xlrd
def ReadSpreadsheet(filePath):
myLog = open(''.join([filePath,'.log.txt']), 'w')
myLog.write(''.join(['Opening ',filePath,'\n']))
wBook = xlrd.open_workbook(filePath, logfile=myLog, verbosity=True)
myLog.close()
This function will write a log file that shows the components for each file. Testing with four files, it's very obvious from the log which of the files are recognized as xlsx files, which are recognized as xls files, and which are not recognized:
Office 2010 xlsx file:
>>> testing_xls.ReadSpreadsheet('MS.xlsx')
Opening MS.xlsx
ZIP component_names:
['[Content_Types].xml',
'_rels/.rels',
'xl/_rels/workbook.xml.rels',
'xl/workbook.xml',
'xl/sharedStrings.xml',
'xl/worksheets/_rels/sheet1.xml.rels',
'xl/theme/theme1.xml',
'xl/styles.xml',
'xl/worksheets/sheet1.xml',
'docProps/core.xml',
'xl/printerSettings/printerSettings1.bin',
'docProps/app.xml']
Office 2010 xls file:
>>> testing_xls.ReadSpreadsheet('MS.xls')
Opening MS.xls
CODEPAGE: codepage 1200 -> encoding 'utf_16_le'
DATEMODE: datemode 0
Countries: (1, 1)
Colour indexes used:
[]
NOTE *** sheet 0 (u'Sheet1'): DIMENSIONS R,C = 26,9 should be 23,9
LibreOffice 4.2 xlsx file
>>> testing_xls.ReadSpreadsheet('Libre.xlsx')
Opening Libre.xlsx
ZIP component_names:
[u'_rels/.rels',
u'docProps/app.xml',
u'docProps/core.xml',
u'xl/_rels/workbook.xml.rels',
u'xl/sharedStrings.xml',
u'xl/worksheets/_rels/sheet1.xml.rels',
u'xl/worksheets/sheet1.xml',
u'xl/styles.xml',
u'xl/workbook.xml',
u'[Content_Types].xml']
LibreOffice 4.2 ODS file
>>> testing_xls.ReadSpreadsheet('Libre.ods')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "testing_xls.py", line 6, in ReadSpreadsheet
wBook = xlrd.open_workbook(filePath, logfile=myLog, verbosity=True)
File "/usr/local/lib/python2.7/dist-packages/xlrd/__init__.py", line 422, in open_workbook
raise XLRDError('Openoffice.org ODS file; not supported')
xlrd.biffh.XLRDError: Openoffice.org ODS file; not supported
[Nothing written to log file.]
I suppose I could catch the XLRDError
and return ODS, or read the log file and return XLSX if component_names
is found, and return XLS if codepage
is found.