Search code examples
pythonpandasdataframedata-scienceexport-to-excel

how to give to_excel() in Pandas an argument that changes inside a loop?


Hello my fellow data nerds.

This is my first question ever, so I'll try to be overly thorough.

Today, I am trying to turn a series of large excel documents with many sheets into one giant dataset, along with some changes to the way the data is displayed. to be clear, I am dealing with hundreds of sheets, so I'm looking for ways to optimize the solution.

Specifically, I have a list of the names of the data sheets I need to pull from a large excel spreadsheet. I then am trying to iterate through this list and save the edited list as a new file in a folder in my jupyter notebook.

Here's what I'd like to happen:

list_of_tables = [a,b,c,d,e]
for i in range (0, len(list_of_tables):
      df = pd.read_excel (r'Large_dataset_X.xlsx', sheet_name=list_of_tables[i])

      {Bunch of code formatting and editing the file}

      arg = "r'Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx'"
      df.to_excel(arg, sheet_name= list_of_tables[i], index = False)

The problem is that when I do this loop, the to_excel() argument which contains a path spits out an error: 'engine cannot recognize files of .xlsx'

Error Traceback:

---------------------------------------------------------------------------
OptionError                               Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\io\excel\_base.py in __new__(cls, path, engine, **kwargs)
    632                 try:
--> 633                     engine = config.get_option(f"io.excel.{ext}.writer")
    634                     if engine == "auto":

~\anaconda3\lib\site-packages\pandas\_config\config.py in __call__(self, *args, **kwds)
    232     def __call__(self, *args, **kwds):
--> 233         return self.__func__(*args, **kwds)
    234 

~\anaconda3\lib\site-packages\pandas\_config\config.py in _get_option(pat, silent)
    104 def _get_option(pat: str, silent: bool = False):
--> 105     key = _get_single_key(pat, silent)
    106 

~\anaconda3\lib\site-packages\pandas\_config\config.py in _get_single_key(pat, silent)
     90             _warn_if_deprecated(pat)
---> 91         raise OptionError(f"No such keys(s): {repr(pat)}")
     92     if len(keys) > 1:

OptionError: 'No such keys(s): "io.excel.xlsx\'.writer"'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-24-ec2db8d02335> in <module>
      1 arg = "r'Edited Tables/" + list_of_tables[key] + "_Table_New.xlsx'"
      2 # print(arg)
----> 3 df.to_excel(arg, sheet_name= list_of_tables[key], index = False)

~\anaconda3\lib\site-packages\pandas\core\generic.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes)
   2024             inf_rep=inf_rep,
   2025         )
-> 2026         formatter.write(
   2027             excel_writer,
   2028             sheet_name=sheet_name,

~\anaconda3\lib\site-packages\pandas\io\formats\excel.py in write(self, writer, sheet_name, startrow, startcol, freeze_panes, engine)
    728             need_save = False
    729         else:
--> 730             writer = ExcelWriter(stringify_path(writer), engine=engine)
    731             need_save = True
    732 

~\anaconda3\lib\site-packages\pandas\io\excel\_base.py in __new__(cls, path, engine, **kwargs)
    635                         engine = _get_default_writer(ext)
    636                 except KeyError as err:
--> 637                     raise ValueError(f"No engine for filetype: '{ext}'") from err
    638             cls = get_writer(engine)
    639 

ValueError: No engine for filetype: 'xlsx''

I tried switching to a csv format and error persisted. Not sure what's going wrong. Thanks!


Solution

  • In the line arg = "r'Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx'" you have an extra apostrophe at the end.

    Change that to:

    arg = r"Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx"
    

    Note the error message is saying ValueError: No engine for filetype: 'xlsx'' since it doesn't know how to handle a xlsx' file but would be fine with xlsx without the trailing apostrophe.

    The "r'Edited Tables/" part also has a similar issue.