Search code examples
pythonpandasjoinstring-matchingpartial

Join dataframes based on partial string-match between columns


I have a dataframe which I want to compare if they are present in another df.

after_h.sample(10, random_state=1)

             movie           year   ratings
108 Mechanic: Resurrection   2016     4.0
206 Warcraft                 2016     4.0
106 Max Steel                2016     3.5
107 Me Before You            2016     4.5

I want to compare if the above movies are present in another df.

              FILM                   Votes
0   Avengers: Age of Ultron (2015)   4170
1   Cinderella (2015)                 950
2   Ant-Man (2015)                   3000 
3   Do You Believe? (2015)            350
4   Max Steel (2016)                  560 

I want something like this as my final output:

    FILM              votes
0  Max Steel           560

Solution

  • Given input dataframes df1 and df2, you can use Boolean indexing via pd.Series.isin. To align the format of the movie strings you need to first concatenate movie and year from df1:

    s = df1['movie'] + ' (' + df1['year'].astype(str) + ')'
    
    res = df2[df2['FILM'].isin(s)]
    
    print(res)
    
                   FILM  VOTES
    4  Max Steel (2016)    560