Search code examples
pythonpandasdataframecomparison

Change word in dataframe to different word from different dataframe if they match


I have some trouble with my dataframe comparison. What I have are two dataframes, the first has tokenised words.

df_1:
id  sentence               some more info 
1   [I, am, happy]         bla 
2   [I, am, happier]       bla 
3   [I, am, the, saddest]  bla 

and

df_2:
id word   more     most 
1  happy  happier  happiest 
2  sad    sadder   saddest 

What I want to do is compare the two dataframes and if a word in df_1 matches a word anywhere in df_2 that it will be changed to df_2['word'] in the row of the corresponding word. So my output would look something like this:

df_1
id  sentence               some more info new_sentence
1   [I, am, happy]         bla       [I, am, happy]
2   [I, am, happier]       bla       [I, am, happy]
3   [I, am, the, saddest]  bla       [I, am, the, sad]

I have tried some things using .compare() and writing a function, but nothing has seemed to work so far.

Thanks for your help in advance!


Solution

  • Create dictionary from second DataFrame by remove id column, reshape by DataFrame.melt and DataFrame.set_index:

    d = df.drop('id', axis=1).melt('word').set_index('value')['word'].to_dict()
    

    And then map values in dict.get with return same values if no match:

    df_1['new_sentence'] = df_1['sentence'].apply(lambda x: [d.get(y, y) for y in x])
    

    Or:

    d = df.drop('id', axis=1).melt('word').set_index('value')['word'].to_dict()
    
    df_1['new_sentence'] = [[d.get(y, y) for y in x] for x in df_1['sentence']]