Search code examples
pythonpandasdataframeappendconcatenation

Pandas Concat vs append and join columns --> ("state", "state:", "State")


I join 437 tables and I get 3 columns for state as my coworkers feel like giving it a different name each day, ("state", "state:" and "State"), is there a way that joins those 3 columns to just 1 column called "state"?.

*also my code uses append, I just saw its deprecated, will it work the same using concat? any way to make it give the same results as append?.

I tried:

excl_merged.rename(columns={"state:": "state", "State": "state"})

but it doesn't do anything.

The code I use:

# importing the required modules
import glob
import pandas as pd
 
# specifying the path to csv files
path = "X:/.../Admission_merge"
 
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
 
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the excel
# data into pandas dataframe.
excl_list = []
 
for file in file_list:
    excl_list.append(pd.read_excel(file)) #use .concat will it give the columns in the same order? 
 
# create a new dataframe to store the
# merged excel file.
excl_merged = pd.DataFrame()
 
for excl_file in excl_list:
     
    # appends the data into the excl_merged
    # dataframe.
    excl_merged = excl_merged.append(
      excl_file, ignore_index=True)


# exports the dataframe into excel file with
# specified name.
excl_merged.to_excel('X:/.../***/total_admission_2021-2023.xlsx', index=False)
print("Merge finished")

Any suggestions how I can improve it? also is there a way to remove unnamed empty columns?.

Thanks a lot.


Solution

  • You can use pd.concat:

    excl_list = ['state1.xlsx', 'state2.xlsx', 'state3.xlsx']
    state_map = {'state:': 'state', 'State': 'state'}
    
    data = []
    for excl_file in excl_list:
        df = pd.read_excel(excl_file)
    
        # Case where first row is empty
        if df.columns[0].startswith('Unnamed'):
            df.columns = df.iloc[0]
            df = df.iloc[1:]
    
        df = df.rename(columns=state_map)
        data.append(df)
    excl_merged = pd.concat(data, ignore_index=True)
    
    
    # Output
      ID state
    0  A     a
    1  B     b
    2  C     c
    3  D     d
    4  E     e
    5  F     f
    6  G     g
    7  H     h
    8  I     i
    

    file1.xlsx:

      ID State
    0  A     a
    1  B     b
    2  C     c
    

    file2.xlsx:

      ID state
    0  D     d
    1  E     e
    2  F     f
    

    file3.xlsx:

      ID state:
    0  G      g
    1  H      h
    2  I      i
    

    If you have empty columns, you can use data.append(df.dropna(how='all', axis=1)) before appending to data list.