Search code examples
excelpandas

Use pandas to read excel files with wrong sheet state


I use pandas to read this excel files, which is downloaded from a website via an automation script. Here's my code:

import pandas as pd
df = pd.read_excel('CallHistory.xlsx')

But it shows errors below:

ValueError                                Traceback (most recent call last)
c:\Users\minhviet\Box\Telio\vietpm\python\crawler\test_crawl_3.ipynb Cell 7' in <module>
      1 import pandas as pd
----> 2 df = pd.read_excel('CallHistory.xlsx')
      3 df

File c:\Users\minhviet\Anaconda3\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File c:\Users\minhviet\Anaconda3\lib\site-packages\pandas\io\excel\_base.py:364, in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, storage_options)
    362 if not isinstance(io, ExcelFile):
    363     should_close = True
--> 364     io = ExcelFile(io, storage_options=storage_options, engine=engine)
    365 elif engine and engine != io.engine:
    366     raise ValueError(
    367         "Engine should not be specified when passing "
    368         "an ExcelFile - ExcelFile already has the engine set"
...
    127     if value not in self.values:
--> 128         raise ValueError(self.__doc__)
    129     super(Set, self).__set__(instance, value)

ValueError: Value must be one of {'visible', 'hidden', 'veryHidden'}

I search about this error and find some information. Seems like the state of the sheet is wrong. https://github.com/exceljs/exceljs/issues/678

I try to open this file in Excel, edit something and save, then I can read it from pandas successfully. However, opening this file is a part of an automation script, so using Excel to open and edit is impossible.

You guys can download file here, hopefully anyone can find a way to handle this file by Python: https://app.box.com/s/8vds9zmhhxhn18p0ngodqeepfcgpzevv


Solution

  • I found the solution. I try to read this file through pandas read_excel with different engine (openpyxl and xlrd). Openpyxl shows me above error (ValueError: Value must be one of {'visible', 'hidden', 'veryHidden'}), while xlrd shows another error (KeyError: 'show'). I searched and found that this error was fixed in a commit of xlrd after 1.2.0 ver but before ver 2.0 (which don't support xlsx file anymore), so this fix was not in any official release of xlrd. Here's the information about this fix: https://github.com/python-excel/xlrd/commit/6ec98fc74796a6439c6dd64ed71597a3c50d4986#diff-74efe2926535c21edf8087564ce132fe

    So I installed this fixed commit of xlrd. Be sure to remove other versions of xlrd before installing this version.

    pip install git+https://github.com/python-excel/xlrd.git@6ec98fc74796a6439c6dd64ed71597a3c50d4986
    

    Then I read this file by engine xlrd. It works.

    import pandas as pd
    df = pd.read_excel("C://Users/minhviet/Documents/CallHistory.xlsx", engine='xlrd')