Search code examples
pythonpandasappendconcatenation

Concatenate dataframes from loop into one big dataframe


I have an original dataframe in which I want to overwrite a few columns within a loop. In the end, I want to stack all dataframes (generated in the loop) into one big dataframe (which contains then x times the original dataframe with changes here and there. The stacking of dfs doesn't really seem to work for me (and the multiplication of column A with i either (adding with i did work))

With the 'ID'-column I keep track of all different dataframes in the big one since they are kind of visibly together when looking at the ID's.

Tried code:

import pandas as pd

df_or = pd.DataFrame({"Case": [1,2,3,4,5,6],  
                         "A": [3,5,2,8,4,1],       
                         "B": [10,12,24,8,57,84]})

print(df_or)

total = []

for i in range(0,2):
    df = df_or
    df.loc[:, 'A'] = df_or.loc[:, 'A'].mul(i)       #.add(i) works fine
    df.loc[:, 'ID'] = df.loc[:,'Case'] + i*100000
    print(df)

    total.append(df)

total = pd.concat(total)
total = total.sort_values('ID')
total.reset_index(inplace=True, drop=True)
print(total)

Desired result:

   Case  A   B      ID
0     1  0  10       1
1     2  0  12       2
2     3  0  24       3
3     4  0   8       4 
4     5  0  57       5
5     6  0  84       6
6     1  3  10  100001
7     2  5  12  100002
8     3  2  24  100003
9     4  8   8  100004
10    5  4  57  100005
11    6  1  84  100006

Instead I get:

    Case  A   B      ID
0      1  0  10  100001
1      1  0  10  100001
2      2  0  12  100002
3      2  0  12  100002
4      3  0  24  100003
5      3  0  24  100003
6      4  0   8  100004
7      4  0   8  100004
8      5  0  57  100005
9      5  0  57  100005
10     6  0  84  100006
11     6  0  84  100006

Solution

  • You can try this:

    df_or['id'] = [i for i in range(1, len(df_or['Case'])+1)]
    df1 = df_or.copy()
    df_or['id'] = ['10000'+str(i) for i in range(1, len(df_or['Case'])+1)]
    df = pd.concat([df1, df_or])
    print(df)
    
       Case  A   B      id
    0     1  3  10       1
    1     2  5  12       2
    2     3  2  24       3
    3     4  8   8       4
    4     5  4  57       5
    5     6  1  84       6
    0     1  3  10  100001
    1     2  5  12  100002
    2     3  2  24  100003
    3     4  8   8  100004
    4     5  4  57  100005
    5     6  1  84  100006
    

    Or to fix your code, you need to use df = df_or.copy():

    total = []
    
    for i in range(0,2):
        df = df_or.copy()
        df.loc[:, 'A'] = df_or.loc[:, 'A'].mul(i)       #.add(i) works fine
        df.loc[:, 'ID'] = df.loc[:,'Case'] + i*100000
        print(df)
    
        total.append(df)
    
    total = pd.concat(total)
    total = total.sort_values('ID')
    total.reset_index(inplace=True, drop=True)
    print(total)
    
        Case  A   B      ID
    0      1  0  10       1
    1      2  0  12       2
    2      3  0  24       3
    3      4  0   8       4
    4      5  0  57       5
    5      6  0  84       6
    6      1  3  10  100001
    7      2  5  12  100002
    8      3  2  24  100003
    9      4  8   8  100004
    10     5  4  57  100005
    11     6  1  84  100006