I have two Pandas data frames. One where all the sentences are split by words vertically and look like this
Sentence | Text
1 I
1 like
1 Cats
2 The
2 man
2 plays
2 soccer
2 today
And the other data frame looks like this:
ID | Text | Tags
1 plays soccer sport
2 man human
3 like cats interest
What I would like is to map the Tags to the first data frame for the words in the sentence that match both places, and those who does not match get an "O" to symbolize it.
Sentence | Text | Tags
1 I O
1 like interest
1 Cats interest
2 The O
2 man human
2 plays sport
2 soccer sport
2 today O
Assume df1
is your first table and df2
is your second (where ID
is the index) then you can:
str.split()
+ explode()
)merge()
with a left joinfillna()
the NaN
with O
E.g.:
In []:
df1.merge(df2.assign(Text=df2.Text.str.split()).explode('Text'), 'left').fillna('O')
Out[]:
Sentence Text Tags
0 1 I O
1 1 like interest
2 1 Cats O
3 2 The O
4 2 man human
5 2 plays sport
6 2 soccer sport
7 2 today O
This is currently case sensitive but it is an easy exercise to merge on a column that is str.lower()
of Text
.