Search code examples
pythonpandasdataframemeltcolumnsorting

Joining or merging multiple columns within one dataframe and keeping all data


I have this dataframe:

df = pd.DataFrame({'Position1':[1,2,3], 'Count1':[55,35,45],\
                   'Position2':[4,2,7], 'Count2':[15,35,75],\
                   'Position3':[3,5,6], 'Count3':[45,95,105]})
print(df)

   Position1  Count1  Position2  Count2  Position3  Count3
0          1      55          4      15          3      45
1          2      35          2      35          5      95
2          3      45          7      75          6     105

I want to join the Position columns into one column named "Positions" while sorting the data in the Counts columns like so:

   Positions Count1 Count2 Count3
0          1     55    Nan    Nan
1          2     35     35    Nan
2          3     45    NaN     45
3          4    NaN     15    Nan
4          5    NaN    NaN     95
5          6    Nan    NaN    105
6          7    Nan     75    NaN

I've tried melting the dataframe, combining and merging columns but I am a bit stuck.

Note that the NaN types can easily be replaced by using df.fillna to get a dataframe like so:

df = df.fillna(0)

   Positions  Count1  Count2  Count3
0          1      55       0       0
1          2      35      35       0
2          3      45       0      45
3          4       0      15       0
4          5       0       0      95
5          6       0       0     105
6          7       0      75       0

Solution

  • Here is a way to do what you've asked:

    df = df[['Position1', 'Count1']].rename(columns={'Position1':'Positions'}).join(
        df[['Position2', 'Count2']].set_index('Position2'), on='Positions', how='outer').join(
        df[['Position3', 'Count3']].set_index('Position3'), on='Positions', how='outer').sort_values(
        by=['Positions']).reset_index(drop=True)
    

    Output:

       Positions  Count1  Count2  Count3
    0          1    55.0     NaN     NaN
    1          2    35.0    35.0     NaN
    2          3    45.0     NaN    45.0
    3          4     NaN    15.0     NaN
    4          5     NaN     NaN    95.0
    5          6     NaN     NaN   105.0
    6          7     NaN    75.0     NaN
    

    Explanation:

    • Use join first on Position1, Count1 and Position2, Count2 (with Position1 renamed as Positions) then on that join result and Position3, Count3.
    • Sort by Positions and use reset_index to create a new integer range index (ascending with no gaps).