I'd like to read a .xlsx using python pandas. The problem is the at the beginning of the excel file, it has some additional data like title or description of the table and tables contents starts. That introduce the unnamed columns because pandas DataReader takes it as the columns. But tables contents starts after few lines later.
A B C
this is description
last updated: Mar 18th,2014
Table content
Country Year Product_output
Canada 2017 3002
Bulgaria 2016 2201
...
The table content starts in line 4. And columns must be "Country", "year", "proudct_output" instead "this is description", "unnamed", "unnamed". For this specific case, setting skiprows
parameter to 3 solved the issue(from Mikhail Venkov). But I have to deal with many excel files and I don't know how many rows to skip in advance.
I think there might be a solution since each table column header has a filter.
If you know that a specific text (like country) must be in the first column, you can do the following:
import xlrd
xl_work = xlrd.open_workbook("Classeur1.xlsx")
mySheet = xl_work.sheet_by_index(0)
nl = 0
while mySheet.cell_value(nl,0) != "Country" :
nl += 1
line_with_headers = nl
Then use skiprows with nl
instead of 3.