Search code examples
pythonpandasdataframekeyconcatenation

Concatenate List of Dataframes and Include Original Dataframe Names as Keys


I have code that creates a list of dataframes with the same structure. My wish is to append all of these dataframes together but add a column to the new dataframe that identifies which dataframe the row originally came from.

I easily appended the list of dataframes with:

import pandas as pd
df_rosters = pd.concat(list_of_rosters)

However, I haven't been able to figure how to add a column with the original dataframe name or index. I've found a bunch of examples suggesting to use the keys argument; but each example has hardcoded keys. The size of my list is constantly changing so I need to figure out how to dynamically add in the keys.

Thanks in advance!


Solution

  • Let's assign an indicator column to each DataFrame in the list. (Names can be zipped together with the list of DataFrames or created by something like enumerate):

    With enumerate

    pd.concat(d.assign(df_name=f'{i:02d}') for i, d in enumerate(list_of_rosters))
    
       0  1 df_name
    0  4  7      00
    1  7  1      00
    2  9  5      00
    0  8  1      01
    1  1  8      01
    2  2  6      01
    

    Or with zip:

    pd.concat(d.assign(df_name=name)
              for name, d in zip(['name1', 'name2'], list_of_rosters))
    
       0  1 df_name
    0  4  7   name1
    1  7  1   name1
    2  9  5   name1
    0  8  1   name2
    1  1  8   name2
    2  2  6   name2
    

    Setup:

    import numpy as np
    import pandas as pd
    
    np.random.seed(5)
    list_of_rosters = [
        pd.DataFrame(np.random.randint(1, 10, (3, 2))),
        pd.DataFrame(np.random.randint(1, 10, (3, 2)))
    ]
    

    list_of_rosters:

    [   0  1
    0  4  7
    1  7  1
    2  9  5,    
        0  1
    0  8  1
    1  1  8
    2  2  6]