Search code examples
python-3.xpandasdataframeconcatenation

Concatenating Pandas dataframes, getting Nan values for first dataframe


I'm trying to join two dataframes. 'df' is my initial dataframe containing all the header information I require. 'row' is my first row of data that I want to append to 'df'.

df =
   FName E1         E2          E3          E4          E5          E6
0  Nan   2          2           2           2           2           2
1  Nan   1          1           1           1           1           1
2  Nan   3          4           5           6           7           8
3  Nan   4          5           6           7           8           10
4  Nan   1002003004 1002004005  1002005006  1002006007  1002007008  1002008010


row =
   0                                        1       2       3       4       5       6
0  501#_ZMB_2019-04-03_070528_reciprocals   30.0193 30.0193 30.0193 34.8858 34.8858 34.8858

I'm trying to create this:

   FName                                    E1          E2          E3          E4          E5          E6
0  Nan                                      2           2           2           2           2           2
1  Nan                                      1           1           1           1           1           1
2  Nan                                      3           4           5           6           7           8
3  Nan                                      4           5           6           7           8           10
4  Nan                                      1002003004  1002004005  1002005006  1002006007  1002007008  1002008010
5  501#_ZMB_2019-04-03_070528_reciprocals   30.0193     30.0193     30.0193     34.8858     34.8858     34.8858

I have tried the following:

df = df.append(row, ignore_index=True)

and

df = pd.concat([df, row], ignore_index=True)

Both of these result in the loss of all the data in the first df, which should contain all the header information.

   0                                        1       2       3       4       5       6
0  Nan                                      Nan     Nan     Nan     Nan     Nan     Nan
1  Nan                                      Nan     Nan     Nan     Nan     Nan     Nan
2  Nan                                      Nan     Nan     Nan     Nan     Nan     Nan
3  Nan                                      Nan     Nan     Nan     Nan     Nan     Nan
4  Nan                                      Nan     Nan     Nan     Nan     Nan     Nan
5  501#_ZMB_2019-04-03_070528_reciprocals   30.0193 30.0193 30.0193 34.8858 34.8858 34.8858

I've also tried

df = pd.concat([df.reset_index(drop=True, inplace=True), row.reset_index(drop=True, inplace=True)])

Which produced the following Traceback

Traceback (most recent call last):

  File "<ipython-input-146-3c1ecbd1987c>", line 1, in <module>
    df = pd.concat([df.reset_index(drop=True, inplace=True), row.reset_index(drop=True, inplace=True)])

  File "C:\Users\russells\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 228, in concat
    copy=copy, sort=sort)

  File "C:\Users\russells\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py", line 280, in __init__
    raise ValueError('All objects passed were None')

ValueError: All objects passed were None

Does anyone know what I'm doing wrong?


Solution

  • When you concatenate extra rows, pandas aligns the columns, which currently do not overlap. rename will get the job done:

    pd.concat([df, row.rename(columns=dict(zip(row.columns, df.columns)))],
               ignore_index=True)
    

                                        FName          E1          E2          E3          E4          E5          E6
    0                                     Nan           2           2           2           2           2           2
    1                                     Nan           1           1           1           1           1           1
    2                                     Nan           3           4           5           6           7           8
    3                                     Nan           4           5           6           7           8          10
    4                                     Nan  1002003004  1002004005  1002005006  1002006007  1002007008  1002008010
    5  501#_ZMB_2019-04-03_070528_reciprocals     30.0193     30.0193     30.0193     34.8858     34.8858     34.8858
    

    Or if you just need to assign one row at the end and you have a RangeIndex on df:

    df.loc[df.shape[0], :] = row.to_numpy()