I have this .xlsx
file I can read successfully by using:
pandas.read_excel(file_name, sheet_name="customers", index_col=0)
This works great for most columns but a few with a space between characters like in "profile url". This columns are just missing.
EDIT:
This is some code to reproduce the issue:
import pandas as pd
def read_excel(file_name):
df = pd.read_excel(file_name, sheet_name="customers", index_col=0)
for entry in df.iterrows():
print(entry)
return df
read_excel("test_table.xlsx")
And this a example table to use:
ID,First,Last,Profile Url
1,foo,bar,www.google.com
2,fake,name,https://stackoverflow.com/
This is the value of entry in the first iteration. Doing so I can the objects First and Last.
I would expect to see Profile Url as well.
What I learned by preparing this example is that any header written in lowercase will also be ignored.
'_'
..str.strip()
import pandas as pd
df = pd.DataFrame({'col_no_spaces': [1, 2, 3], 'col with spaces': ['a', 'b', 'c'], ' col_with_leading_trailing_ws ': [4, 5, 6]})
# display(df)
col_no_spaces col with spaces col_with_leading_trailing_ws
0 1 a 4
1 2 b 5
2 3 c 6
View as Series
# strip leading and trailing whitespace, and replace spaces in column names with _
df.columns = df.columns.str.strip().str.replace('\s+', '_', regex=True)
# display(df)
col_no_spaces col_with_spaces col_with_leading_trailing_ws
0 1 a 4
1 2 b 5
2 3 c 6
View as Series