The goal is to combine the important information from sheet1 & sheet2 of 952 excel files. Then save them as csvs in path as a name according to a cell value. Thanks to the stackoverflow community this is mostly working. Now, it gives the error: "List index out of range". This happens around the half way point specifically 475 to 516 files saved correctly.
Would anyone be able to make this work for the entire list?
# https://stackoverflow.com/questions/44776793/copy-all-csv-files-in-a-directory-of-folders-to-one-folder-in-python
# https://stackoverflow.com/questions/59292999/modifying-multiple-csv-files-from-same-directory-in-python
# https://stackoverflow.com/questions/38101009/changing-multiple-column-names-but-not-all-of-them-pandas-python
# https://stackoverflow.com/questions/28465779/how-do-i-limit-the-amount-of-letters-in-a-string
# https://stackoverflow.com/questions/37952797/pandas-dataframe-column-name-remove-special-character
import glob
import pandas as pd
excel_files = glob.glob('data1/*.xlsx')
path = Path('data2')
for excel in excel_files:
df1 = pd.read_excel(excel, sheet_name=0, dtype=str, index_col=None)
df2 = pd.read_excel(excel, sheet_name=1, dtype=str, index_col=None)
i = df1.iat[0,1]
j = df1.iat[0,15]
df2.rename(columns={'Date':'Date','Sales':i+j}, inplace=True)
df2.columns=df2.columns.str.replace('(','')
df2.columns=df2.columns.str.replace('/','')
df2.columns=df2.columns.str.replace(')','')
df2.columns=df2.columns.str.replace(' ','-')
df2.columns=df2.columns.str.replace('<','-')
df2.columns=df2.columns.str.replace('>','-')
k = df2.columns[1]
l = (k)[:19]
m = l + '.csv'
df2.to_csv(path/m, encoding='utf-8', index=False)
->Edit below: Stack-trace as requested. Thanks for taking a look at least.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-10-3f727a40755a> in <module>
23 for excel in excel_files:
24 df1 = pd.read_excel(excel, sheet_name=0, dtype=str, index_col=None)
---> 25 df2 = pd.read_excel(excel, sheet_name=1, dtype=str, index_col=None)
26 i = df1.iat[0,1]
27 j = df1.iat[0,15]
~/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
332 convert_float=convert_float,
333 mangle_dupe_cols=mangle_dupe_cols,
--> 334 **kwds,
335 )
336
~/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
886 convert_float=convert_float,
887 mangle_dupe_cols=mangle_dupe_cols,
--> 888 **kwds,
889 )
890
~/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_base.py in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)
439 sheet = self.get_sheet_by_name(asheetname)
440 else: # assume an integer if not a string
--> 441 sheet = self.get_sheet_by_index(asheetname)
442
443 data = self.get_sheet_data(sheet, convert_float)
~/anaconda3/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py in get_sheet_by_index(self, index)
44
45 def get_sheet_by_index(self, index):
---> 46 return self.book.sheet_by_index(index)
47
48 def get_sheet_data(self, sheet, convert_float):
~/anaconda3/lib/python3.7/site-packages/xlrd/book.py in sheet_by_index(self, sheetx)
464 :returns: A :class:`~xlrd.sheet.Sheet`.
465 """
--> 466 return self._sheet_list[sheetx] or self.get_sheet(sheetx)
467
468 def sheet_by_name(self, sheet_name):
IndexError: list index out of range
Chris's comment solved the problem:
It means your particular excel workbook only has one sheet unlike the others and
read_excel
fails if parametersheet_name
is 1. If you want to handle that case, you need to wrap yourpandas.read_excel
call inside a try / except clause.