Search code examples
pythonpandasconcatenationmultiple-columnsnames

loop or iterate to determine if all columns from multiple datasets have the same name and position in Python


I have 15 datasets or data frames, let them be named data_1 to data_15. All suppose to have the same columns names. I would like to check if all columns have the same name and position before concatenate them. I concatenated them and I ended with an extra column because one column name of one dataset was misspelled. I used the following following code per dataset, but I would like to improve my skills and save time.

print(list(data_1))

The code I use to concatenate all datasets is the following:

pd.concat([data_1; data_2...data_15])

Solution

  • Put all the dataframes in a list, then use all() to test if the column names are all the same.

    columns = list(data_1.columns.values)
    df_list = [data_2, ..., data_15]
    
    if all(list(df.columns.values) == columns for df in df_list):
        # code that concatenates all the dataframes
    else:
        print("Columns don't match")