Search code examples
pythonpandasdataframenamed-entity-recognition

find sentence among sentence of words


I have two Pandas data frames. One where all the sentences are split by words vertically and look like this

Sentence | Text   
1            I
1          like
1          Cats
2          The
2          man
2         plays
2         soccer
2         today

And the other data frame looks like this:

ID      | Text         | Tags
1         plays soccer   sport
2          man           human
3         like cats     interest

What I would like is to map the Tags to the first data frame for the words in the sentence that match both places, and those who does not match get an "O" to symbolize it.

Sentence | Text        | Tags 
1          I             O
1          like          interest
1          Cats          interest
2          The           O
2          man           human
2          plays         sport
2          soccer        sport
2          today         O

Solution

  • Assume df1 is your first table and df2 is your second (where ID is the index) then you can:

    • Explode your second table (str.split() + explode())
    • merge() with a left join
    • fillna() the NaN with O

    E.g.:

    In []:
    df1.merge(df2.assign(Text=df2.Text.str.split()).explode('Text'), 'left').fillna('O')
    
    Out[]:
       Sentence    Text      Tags
    0         1       I         O
    1         1    like  interest
    2         1    Cats         O
    3         2     The         O
    4         2     man     human
    5         2   plays     sport
    6         2  soccer     sport
    7         2   today         O
    

    This is currently case sensitive but it is an easy exercise to merge on a column that is str.lower() of Text.