Search code examples
pythonpandasscikit-learnsklearn-pandas

Reordering rows in a dataframe to match order of rows in another dataframe


I have 2 dataframes df1 and df2 which have the same number of rows but in a different ordering. The dataframes have an ID column which contains a unique identifier for each row, both dataframes have this ID column and this is the column for which I would like the order of the rows in df2 to match in df1. The reason I want the order to be the same is that the dataframes will be passed to sklearn train_test_split and I wish to split the dataframes in the exact same order. How can I do this?

Here is the first dataframe:

df1

Paper ID          Document       Label 
math.123.txt    df1 content 1      0
math.321.txt    df1 content 2      0
math.213.txt    df1 content 3      0

df2

Paper ID         Document        Label 
math.123.txt     df2 content 1         0
math.213.txt     df2 content 2         0
math.321.txt     df2 content 3         0

desired order of df2:

df2

Paper ID         Document            Label 
math.123.txt     df2 content 1         0
math.321.txt     df2 content 3         0
math.213.txt     df2 content 2         0

So essentially I just want to reorder the rows in df2 based on the order of rows in df1["Paper ID"]


Solution

  • Since you want to order the dataframes according to the Paper ID, you should first set them as the index in both dataframes:

    df1.set_index('Paper ID', inplace=True)
    df2.set_index('Paper ID', inplace=True)
    

    Now you can reindex df2 to match the order of df1:

    df2 = df2.reindex(df1.index)
    

    Finally, reset the indices to restore the default index:

    df1.reset_index(inplace=True)
    df2.reset_index(inplace=True)