Search code examples
pythondataframefor-loopgoogle-colaboratory

How to add multiple different DataFrame using for loop?


I have multiple data frames and I want to add the values of the next data frame after the 3rd value of the previous data frame.

I am very new with pyhton and I am using google colab so I hope you can help me. Thank you very much. So below is an example of how I want to do it.

df1:

Index Column_1
0     1
1     1
2     1
3     1
4     1
5     1

df2:

Index Column_2
0     2
1     2
2     2
3     2
4     2
5     2

df3:

Index Column_3
0     3
1     3
2     3
3     3
4     3
5     3

I want to add values of df2 starting from index 3 of df1. Also I want to add values of df3 starting from the index 6 of df2. so basically the indices doubles as the dataframes are added.

so df1 + df2 + df3 should look lie this:

df4:

Index Column_1 Column_2 Column_3
0     1                 
1     1
2     1
3     1         3
4     1         3
5     1         3       
6               2       5
7               2       5
8               2       5
9                       3
10                      3
11                      3

or

df4

Index Column_4
0     1
1     1
2     1 
3     3
4     3
5     3
6     5
7     5
8     5
9     3
10    3
11    3

Is there a way to do this in a loop?

I hope you guys can help me. Thank you very much.


Solution

  • Use pd.merge

    df = df1.merge(df2.assign(Index=df2['Index']+3), how='outer') \
            .merge(df3.assign(Index=df3['Index']+6), how='outer')
    df['Column_2'] += df['Column_1'].fillna(0)
    df['Column_3'] += df['Column_2'].fillna(0)
    
    >>> df
        Index  Column_1  Column_2  Column_3
    0       0       1.0       NaN       NaN
    1       1       1.0       NaN       NaN
    2       2       1.0       NaN       NaN
    3       3       1.0       3.0       NaN
    4       4       1.0       3.0       NaN
    5       5       1.0       3.0       NaN
    6       6       NaN       2.0       5.0
    7       7       NaN       2.0       5.0
    8       8       NaN       2.0       5.0
    9       9       NaN       NaN       3.0
    10     10       NaN       NaN       3.0
    11     11       NaN       NaN       3.0
    

    Update

    Is there a way I can render the result in one column only?

    df['Column_4'] = df.ffill(axis=1)['Column_3']
    
    >>> df
        Index  Column_1  Column_2  Column_3  Column_4
    0       0       1.0       NaN       NaN       1.0
    1       1       1.0       NaN       NaN       1.0
    2       2       1.0       NaN       NaN       1.0
    3       3       1.0       3.0       NaN       3.0
    4       4       1.0       3.0       NaN       3.0
    5       5       1.0       3.0       NaN       3.0
    6       6       NaN       2.0       5.0       5.0
    7       7       NaN       2.0       5.0       5.0
    8       8       NaN       2.0       5.0       5.0
    9       9       NaN       NaN       3.0       3.0
    10     10       NaN       NaN       3.0       3.0
    11     11       NaN       NaN       3.0       3.0