junior dev here.
Goal: Using python, convert file type from xls to xlsx that contains a clean header.
My attempt:
My first attempt was to use win32com. However, that didn't work because I received the following two errors when pip installing. I believe it's because I'm on a Mac.
ERROR: Could not find a version that satisfies the requirement win32com (from versions: none)
ERROR: No matching distribution found for win32com
I then followed this post that doesn't use win32com, however, that produced this error.
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<?xml ve'
The other issue I'm running into is my file itself. At the top, there are 6 extra lines that need to be removed. In addition, my headers of the actual data table have a mix of merged and unmerged cells. I'm not certain how to go about fixing that.
Any suggestions would be helpful and thank you in advance!
Answering the second part of my question. Still not certain on how to take in xls files.
If I convert the file to a CSV file, then use this command to remove the top few lines. skiprows is the method to use that cuts out the top section of a csv or xlsx file during the df's intialization.
df = pd.read_csv('file_name.csv', skiprows = 8)