Search code examples
pythonpandasdataframedata-scienceconcatenation

How do i perform join on 2 dataframes that have same number of rows but no matching column names?


I am trying to plot a histogram for missing NaN values across all features of a dataframe For that i created a dataframe for missing NaN values

Missing value Dataframe

   0
-----
0  0
1  14
2  800
.
.
84 2344

Then i have this master dataframe that has multiple columns i am not concerned about since i want only the row names from this dataframe

Master Dataframe

     0  1
---------
F1   3  3
F2   4  3
.
.
F85  5  2

How can i merge/ concatenate these 2 dataframes where the final output should be like (Columns in master dataframe are irrelevant since i want to plot number of missing values across all features i.e. F1,F2,...F85)

    F1   0  
    F2   14 
    F3   800
    .
    .
    F85  2344

Solution

  • Asume your dataframes are df1 (Missing value Dataframe) and df2 (Master dataframe). Then you can try this:

    df1.columns=['X']
    res = df2.reset_index().join(df1.reset_index(), rsuffix='_r')[['index', 'X']].set_index('index')
    print(res)
    

    Result will be:

    index      
    F1        0
    F2       14
    F3      800
    ...
    F85    2344
    

    The idea is to use reset_index to replace index in both dataframes with row numbers and then merge dataframes