Hello my fellow data nerds.
This is my first question ever, so I'll try to be overly thorough.
Today, I am trying to turn a series of large excel documents with many sheets into one giant dataset, along with some changes to the way the data is displayed. to be clear, I am dealing with hundreds of sheets, so I'm looking for ways to optimize the solution.
Specifically, I have a list of the names of the data sheets I need to pull from a large excel spreadsheet. I then am trying to iterate through this list and save the edited list as a new file in a folder in my jupyter notebook.
Here's what I'd like to happen:
list_of_tables = [a,b,c,d,e]
for i in range (0, len(list_of_tables):
df = pd.read_excel (r'Large_dataset_X.xlsx', sheet_name=list_of_tables[i])
{Bunch of code formatting and editing the file}
arg = "r'Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx'"
df.to_excel(arg, sheet_name= list_of_tables[i], index = False)
The problem is that when I do this loop, the to_excel() argument which contains a path spits out an error: 'engine cannot recognize files of .xlsx'
Error Traceback:
---------------------------------------------------------------------------
OptionError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\io\excel\_base.py in __new__(cls, path, engine, **kwargs)
632 try:
--> 633 engine = config.get_option(f"io.excel.{ext}.writer")
634 if engine == "auto":
~\anaconda3\lib\site-packages\pandas\_config\config.py in __call__(self, *args, **kwds)
232 def __call__(self, *args, **kwds):
--> 233 return self.__func__(*args, **kwds)
234
~\anaconda3\lib\site-packages\pandas\_config\config.py in _get_option(pat, silent)
104 def _get_option(pat: str, silent: bool = False):
--> 105 key = _get_single_key(pat, silent)
106
~\anaconda3\lib\site-packages\pandas\_config\config.py in _get_single_key(pat, silent)
90 _warn_if_deprecated(pat)
---> 91 raise OptionError(f"No such keys(s): {repr(pat)}")
92 if len(keys) > 1:
OptionError: 'No such keys(s): "io.excel.xlsx\'.writer"'
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-24-ec2db8d02335> in <module>
1 arg = "r'Edited Tables/" + list_of_tables[key] + "_Table_New.xlsx'"
2 # print(arg)
----> 3 df.to_excel(arg, sheet_name= list_of_tables[key], index = False)
~\anaconda3\lib\site-packages\pandas\core\generic.py in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep, verbose, freeze_panes)
2024 inf_rep=inf_rep,
2025 )
-> 2026 formatter.write(
2027 excel_writer,
2028 sheet_name=sheet_name,
~\anaconda3\lib\site-packages\pandas\io\formats\excel.py in write(self, writer, sheet_name, startrow, startcol, freeze_panes, engine)
728 need_save = False
729 else:
--> 730 writer = ExcelWriter(stringify_path(writer), engine=engine)
731 need_save = True
732
~\anaconda3\lib\site-packages\pandas\io\excel\_base.py in __new__(cls, path, engine, **kwargs)
635 engine = _get_default_writer(ext)
636 except KeyError as err:
--> 637 raise ValueError(f"No engine for filetype: '{ext}'") from err
638 cls = get_writer(engine)
639
ValueError: No engine for filetype: 'xlsx''
I tried switching to a csv format and error persisted. Not sure what's going wrong. Thanks!
In the line arg = "r'Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx'"
you have an extra apostrophe at the end.
Change that to:
arg = r"Edited Tables/" + list_of_tables[i] + "_Table_New.xlsx"
Note the error message is saying ValueError: No engine for filetype: 'xlsx''
since it doesn't know how to handle a xlsx'
file but would be fine with xlsx
without the trailing apostrophe.
The "r'Edited Tables/"
part also has a similar issue.