I have code that creates a list of dataframes with the same structure. My wish is to append all of these dataframes together but add a column to the new dataframe that identifies which dataframe the row originally came from.
I easily appended the list of dataframes with:
import pandas as pd
df_rosters = pd.concat(list_of_rosters)
However, I haven't been able to figure how to add a column with the original dataframe name or index. I've found a bunch of examples suggesting to use the keys
argument; but each example has hardcoded keys. The size of my list is constantly changing so I need to figure out how to dynamically add in the keys.
Thanks in advance!
Let's assign
an indicator column to each DataFrame in the list. (Names can be zipped together with the list of DataFrames or created by something like enumerate
):
With enumerate
pd.concat(d.assign(df_name=f'{i:02d}') for i, d in enumerate(list_of_rosters))
0 1 df_name
0 4 7 00
1 7 1 00
2 9 5 00
0 8 1 01
1 1 8 01
2 2 6 01
Or with zip
:
pd.concat(d.assign(df_name=name)
for name, d in zip(['name1', 'name2'], list_of_rosters))
0 1 df_name
0 4 7 name1
1 7 1 name1
2 9 5 name1
0 8 1 name2
1 1 8 name2
2 2 6 name2
Setup:
import numpy as np
import pandas as pd
np.random.seed(5)
list_of_rosters = [
pd.DataFrame(np.random.randint(1, 10, (3, 2))),
pd.DataFrame(np.random.randint(1, 10, (3, 2)))
]
list_of_rosters
:
[ 0 1
0 4 7
1 7 1
2 9 5,
0 1
0 8 1
1 1 8
2 2 6]