pandas.read_excel imports wrong values for a column

I am using pandas.read_excel to import an excel file into a DataFrame. This is the Code...

#!/usr/bin/python
import pandas as pd
file = 'sample.xls'
df = pd.read_excel(file, sheetname=0, skiprows=7)

This imports the file but with the below warning...

WARNING *** OLE2 stream 'SSCS': expected size 128640, actual size 512

And When when I print the dataframe, I see that the last column has completely wrong values(instead of actual values from that column, it has shows 4 for every row.

I am attaching a screen grab of the xls file.

Solution

If you are using Windows, you could use Excel itself to modify all of the XLS files before loading them with Pandas. The following script will automatically unhide all of the columns in all XLS files found in a given folder:

import win32com.client as win32
import glob

excel = win32.gencache.EnsureDispatch('Excel.Application')

for xls in glob.glob(r"C:\My Path\*.xls"):
    print xls
    wb = excel.Workbooks.Open(xls)
    ws = wb.Worksheets(1)
    ws.Columns.EntireColumn.Hidden = False
    excel.DisplayAlerts = False     # Allow file overwrite
    wb.Close(True)

excel.Application.Quit()

You might want to make a copy of your XLS files before doing this as it will be done in place. Alternatively, you could use wb.SaveAs() to specify a different output location.