Search code examples
pythonpandasappendconcatenation

Python append versus concat with multiple files in a "For Loop"


I have a working solution, but I am trying to understand exactly why my first solution did not work in an effort to better educate myself on the topic. I am attempting to load multiple excel files from a gov website and load into one dataframe for analysis. It is import to cycle through the list because the site is updated frequently.

My first attempt at this yielded an empty dataframe.

import glob
import pandas as pd
df_list = pd.DataFrame()
for name in glob.glob('C:\Market research\Total Files\*.xls*'): 
    df = pd.read_excel(name)
    df.columns = df.columns.str.lower()
    df_list.append(df)
print(df_list)

This resulted in the following output:

Empty DataFrame Columns: [] Index: []

My second attempt was the following code:

import glob
import pandas as pd

df_list = []
for name in glob.glob('C:\Market research\Total Files\*.xls*'): 
    df = pd.read_excel(name)
    df_list.append(df)
    df.columns = df.columns.str.lower()
jobs_df = pd.concat(df_list)

print(jobs_df.shape)
print(jobs_df.head())

Which yielded the following, which is what I would have expected.

  occ_code                        occ_title  group         tot_emp 
0  00-0000                  All Occupations    NaN       130307840   
1  11-0000           Management occupations  major         5960560   
2  11-1011                 Chief executives    NaN          321300   
3  11-1021  General and operations managers    NaN         1663810   
4  11-1031                      Legislators    NaN           61060   

Why did these not yield the same result?


Solution

  • Regardless of printing the wrong variable: In your first solution with the dataframe df_list

    df_list.append(df)
    

    creates a new dataframe. You need to assign it to a variable

    df_list = df_list.append(df)
    

    Your second solution is preferable.