Comparing dataframes for same values

I have generated 2 dataframes from csv files:

one containing a column of words and a column of their occurrence in an article (saved in the df as type: object
one containing the same as above but with different words and different occurrence for these words (some words are listed in both df)

Here is how the df's look like (both are built the same):

              word  occurance
0            labor          4
1      predictions          2
2              nfl          2
3             kids          2
4           africa          2
5         pandemic          2
6             kara          2
7             days          2
8          swisher          2
9            event          2
10             day          2
11        football          2
12          office          2
13              us          2
14        politics          2
15           media          2
16        abortion          2
17         preview          2
18           music          2
19           texas          2
20           south          2
21         workers          2
22            anti          1
23         sanders          1
24        movement          1
25          bernie          1
26          budget          1

How can I check if there are words in the second df that occurred as well in the first df and if there is a match add the occurrences from first and second df so to have a total score (occurrence) saved within the first df at the end of the program?

Thank you in advance

Solution

Considering dataframes first and second for example.

first

first = pd.DataFrame({"word": ["A", "B", "C", "D"], "occurrence": [1, 2, 3, 4]})

    word  occurrence
0      A           1
1      B           2
2      C           3
3      D           4

second

second = pd.DataFrame({"word": ["A", "B", "Y", "Z"], "occurrence": [6, 2, 4, 1]})

    word  occurrence
0      A           6
1      B           2
2      Y           4
3      Z           1

Final dataframe

Since only the words present in first needs to be added with the ones present in second, using left join and taking the sum of occurrences works.

pd.merge(first, second, how="left", on=["word"]) \
    .set_index(["word"]) \
    .sum(axis=1).astype(int) \
    .reset_index(name="occurrence")

    word  occurrence
0      A           7
1      B           4
2      C           3
3      D           4