I have generated 2 dataframes from csv files:
Here is how the df's look like (both are built the same):
word occurance
0 labor 4
1 predictions 2
2 nfl 2
3 kids 2
4 africa 2
5 pandemic 2
6 kara 2
7 days 2
8 swisher 2
9 event 2
10 day 2
11 football 2
12 office 2
13 us 2
14 politics 2
15 media 2
16 abortion 2
17 preview 2
18 music 2
19 texas 2
20 south 2
21 workers 2
22 anti 1
23 sanders 1
24 movement 1
25 bernie 1
26 budget 1
How can I check if there are words in the second df that occurred as well in the first df and if there is a match add the occurrences from first and second df so to have a total score (occurrence) saved within the first df at the end of the program?
Thank you in advance
Considering dataframes first
and second
for example.
first
first = pd.DataFrame({"word": ["A", "B", "C", "D"], "occurrence": [1, 2, 3, 4]})
word occurrence
0 A 1
1 B 2
2 C 3
3 D 4
second
second = pd.DataFrame({"word": ["A", "B", "Y", "Z"], "occurrence": [6, 2, 4, 1]})
word occurrence
0 A 6
1 B 2
2 Y 4
3 Z 1
Final dataframe
Since only the words present in first
needs to be added with the ones present in second
, using left join and taking the sum of occurrences works.
pd.merge(first, second, how="left", on=["word"]) \
.set_index(["word"]) \
.sum(axis=1).astype(int) \
.reset_index(name="occurrence")
word occurrence
0 A 7
1 B 4
2 C 3
3 D 4