Let's say I have multiple data frames df1,df2,df3 (in reality,I have about 11 data frames). These are converted to data frames from excel worksheets.
for example purpose:
DateTime | Col1 | Col2 | Col3 |
----------------
jan | 20 | 30 | 40 |
Feb | NaN | NaN| NaN |
Mar | NaN | NaN | NaN |
Apr | NaN |NaN | NaN |
| Col1 | Col2 | Col3 |
----------------
jan | NaN | NaN | NaN |
Feb | 20 | 30 | 40 |
Mar | NaN | NaN | NaN |
Apr | NaN |NaN | NaN |
| Col1 | Col2 | Col3 |
----------------
jan | NaN | NaN | NaN |
Feb | NaN | NaN | NaN |
Mar | 10 | 20 | 80 |
Apr | NaN |NaN | NaN |
I would like the output to be
Out= DateTime | Col1 | Col2 | Col3 |
----------------
jan | 20 | 30 | 40 |
Feb | 20 | 30 | 0 |
Mar | 10 | 20 | 80 |
Apr | NaN |NaN | NaN |
Say, there are multiple columns but same names in all, and multiple rows but the date column has the same number of rows and names in all the spreadsheets. I tried to join, merge and concatenate but they either overwrite the values, or add extra rows or columns. In the end, I would like the output to have the same number of rows and columns.
I am new to python, so trying to figure this out!
Use concat
with aggregate sum
, solution working with column date
:
df = pd.concat([df1, df2, df3]).groupby('date', sort=False).sum(min_count=1)
If working with DatetimeIndex:
df = pd.concat([df1, df2, df3]).groupby(level=0, sort=False).sum(min_count=1)
print (df)
Col1 Col2 Col3
jan 20.0 30.0 40.0
Feb 20.0 30.0 40.0
Mar 10.0 20.0 80.0
Apr NaN NaN NaN