I am using pandas.read_excel to import an excel file into a DataFrame. This is the Code...
#!/usr/bin/python
import pandas as pd
file = 'sample.xls'
df = pd.read_excel(file, sheetname=0, skiprows=7)
This imports the file but with the below warning...
WARNING *** OLE2 stream 'SSCS': expected size 128640, actual size 512
And When when I print the dataframe, I see that the last column has completely wrong values(instead of actual values from that column, it has shows 4 for every row.
If you are using Windows, you could use Excel itself to modify all of the XLS files before loading them with Pandas. The following script will automatically unhide all of the columns in all XLS files found in a given folder:
import win32com.client as win32
import glob
excel = win32.gencache.EnsureDispatch('Excel.Application')
for xls in glob.glob(r"C:\My Path\*.xls"):
print xls
wb = excel.Workbooks.Open(xls)
ws = wb.Worksheets(1)
ws.Columns.EntireColumn.Hidden = False
excel.DisplayAlerts = False # Allow file overwrite
wb.Close(True)
excel.Application.Quit()
You might want to make a copy of your XLS files before doing this as it will be done in place. Alternatively, you could use wb.SaveAs()
to specify a different output location.