I am trying to convert all xlsx files to csv files in a folder. It worked well in the past, but I am getting an error this time that leaves me no clue.
Here's my code:
excel_files = glob.glob('/*xlsx*')
for excel_file in excel_files:
df = pd.read_excel(excel_file)
output = excel_file.split('.')[0]+'.csv'
df.to_csv(output)
I have also tried the following line to make sure it's not the encoding issue:
df.to_csv(output, encoding='utf-8', index=False)
It converted around 1000 files, but the rest of the 7000 files kept getting the error:
KeyError: 'rId6'
How would you solve it? Thank you.
Some of your files are badly formatted in some way. You should add exception handling to your loop, this would allow the conversions to continue and would indicate which of your files are causing the problem:
excel_files = glob.glob('/*xlsx*')
for excel_file in excel_files:
print("Converting '{}'".format(excel_file))
try:
df = pd.read_excel(excel_file)
output = excel_file.split('.')[0]+'.csv'
df.to_csv(output)
except KeyError:
print(" Failed to convert")
You could then try opening the failing files inside Excel to see if they load ok. If they do load, you could upload an example of a failing Excel file to something like pastebin and add a comment here with the link to it so the problem can be recreated.