I have a working solution, but I am trying to understand exactly why my first solution did not work in an effort to better educate myself on the topic. I am attempting to load multiple excel files from a gov website and load into one dataframe for analysis. It is import to cycle through the list because the site is updated frequently.
My first attempt at this yielded an empty dataframe.
import glob
import pandas as pd
df_list = pd.DataFrame()
for name in glob.glob('C:\Market research\Total Files\*.xls*'):
df = pd.read_excel(name)
df.columns = df.columns.str.lower()
df_list.append(df)
print(df_list)
This resulted in the following output:
Empty DataFrame Columns: [] Index: []
My second attempt was the following code:
import glob
import pandas as pd
df_list = []
for name in glob.glob('C:\Market research\Total Files\*.xls*'):
df = pd.read_excel(name)
df_list.append(df)
df.columns = df.columns.str.lower()
jobs_df = pd.concat(df_list)
print(jobs_df.shape)
print(jobs_df.head())
Which yielded the following, which is what I would have expected.
occ_code occ_title group tot_emp
0 00-0000 All Occupations NaN 130307840
1 11-0000 Management occupations major 5960560
2 11-1011 Chief executives NaN 321300
3 11-1021 General and operations managers NaN 1663810
4 11-1031 Legislators NaN 61060
Why did these not yield the same result?
Regardless of printing the wrong variable: In your first solution with the dataframe df_list
df_list.append(df)
creates a new dataframe. You need to assign it to a variable
df_list = df_list.append(df)
Your second solution is preferable.