I have 2 data frames
Name Middle Name Last Name
0 Ambar Klein Liam
1 Maya J Jacobson
2 Steve Knowles Gomez
3 Sam l Smith
4 Gomez John Smith
5 Mical M Mason
Name Last Name
0 Smith Klein
1 Sams Peter
2 Betts Knowles
I am trying to concat these 2 data frames that relatively have the same data but dont necessarily have any joining fields. My goal is to have all the data in one DF rather than having to work from multiple dfs.
I currently have the following code, but when i concat all the columns are not concat into one table, but rather some column are missing and show up in the row.
test_li = []
for test_filename in test:
test_df = pd.read_excel(test_filename, index_col=None, header=None)
test_li.append(test_df)
test_frame = (
pd.concat(test_li , axis=0, ignore_index=True)
.dropna(how="all") #to get rid of the eventual extra rows above each header
.drop_duplicates() #to get rid of the cumulated duplicated headers
.T.set_index(0).T #to make the first row as header of the dataframe
# .drop(list(df.filter(regex=string)),axis=1,inplace=True)
)
test_frame = test_frame.loc[:,test_frame.columns.notnull()]
test_frame
the above code produces the current output: However in row 13 you can see that one of my tables had Only the first and last name as columns whereas the second table had first, middle and last name. And during the concat process, the headers from one of the tables is converted into row 13.
Current output
Name Middle Name Last Name
1 Ambar Klein Liam
2 Maya J Jacobson
3 Steve Knowles Gomez
4 Sam l Smith
5 Gomez John Smith
6 Mical M Mason
12 NaN NaN t
13 Name Last Name NaN
14 Smith Klein NaN
15 Sams Peter NaN
16 Betts Knowles NaN
I would like to be able to concat my dfs and have the output in the format below, and to where all the columns from both dfs are merged together.
Desired Output
Name Middle Name Last Name
0 Ambar Klein Liam
1 Maya J Jacobson
2 Steve Knowles Gomez
3 Sam l Smith
4 Gomez John Smith
5 Mical M Mason
6 Smith - Klein
7 Sams - Peter
8 Betts - Knowles
You should import your data with a correct header. Use the default header=0
parameter of read_excel
, not header=None
:
test_li = []
for test_filename in test:
test_li.append(pd.read_excel(test_filename, index_col=None))
Then a simple concat
will work directly:
test_frame = (pd.concat(test_li, ignore_index=True)
.dropna(how='all') # optional
.drop_duplicates() #
)