I have a short script that's a reduction of my application written using "pandas>=0.25.3"
which has been upgraded to "pandas==1.1.5"
the latest version of our code. In this version of Pandas, the default engine does not parse xlsx so we've added the engine="openpyxl"
. However, there's a new issue. The read_excel
no longer seems to respect the names
argument and has strange behavior.
import pandas
filename = "... .xlsx"
names = ["foo", "bar", "baz"]
data_frame = pandas.read_excel(
filename,
header=None,
names=names,
engine="openpyxl",
skiprows=3,
sheet_name=0,
)
print(data_frame.iloc[3])
Running the script with the new Pandas I get this output:
foo NaN
bar NaN
baz NaN
Name: (FxK2,SMin, 2066.125), dtype: float64
But previously in pandas-0.25.3 parsing with the xlrd engine by default I got what I expected which was this:
foo FxK2
bar SMin
baz 2066.125
The names
field gave names to the columns and then I could reference data_frame.iloc[0].baz
and get 2066.125
. Now for some reason, then entire thing ends up in the optional name field for the data frame.
How can I get the behavior I was used to and is this potentially a bug or just new interface I'm not used to? The pandas-1.1.5 certainly seem to reference the names
argument in the same way as I was used to using it.
Updated (after comment threads):
I got the same problem for the second excel file when using engine='openpyxl'
. When reading with engine='xlrd'
I didn't get the problem, though. But it's probably a version-related issue with xlrd
, since I'm using an older version '1.2.0' (currently it's 2.0.1), and there's a big warning in xlrd docummentation saying it no longer read anything other than .xls files.
Therefore, as we discussed in the comments, a possible work-around to use engine='openpyxl'
is to specify the columns with usecols=range(3)
, although it requires a previous knowledge of the columns in the file. The issue might be at some hiding formatting/styling/etc in those "empty" cells.
For example, in the same problematic file, if I select and copy the cells with values to another Sheet (Sheet1) in the same file, and read it with:
pd.read_excel('redux2.xlsx', sheet_name='Sheet1', header=None, names=['foo','bar','baz'], engine='openpyxl', skiprows=1)
then it works as expected.
Original Answer:
It's working as expected for me. Probably it's something with your data. Try the following code. If it works, then you know it's something with your data, not pandas.
Sample data
import string
df = pd.DataFrame({
'col1': list(string.ascii_letters[:10]),
'col2': list(string.ascii_letters[10:20]),
'col3': range(1,11)
})
df.to_excel('file.xlsx', index=False)
print(df)
col1 col2 col3
0 a k 1
1 b l 2
2 c m 3
3 d n 4
4 e o 5
5 f p 6
6 g q 7
7 h r 8
8 i s 9
9 j t 10
Testing
filename = 'file.xlsx'
names = ["foo", "bar", "baz"]
data_frame = pd.read_excel(
filename,
header=None,
names=names,
engine="openpyxl",
skiprows=3,
sheet_name=0,
)
print(data_frame.iloc[3])
foo f
bar p
baz 6
Name: 3, dtype: object
Note: My pandas version is '1.1.5' as well.